Parallel Data Mining
Data mining is looking for patterns in a dataset. Due to the massive size of the modern day databases and their exponential growth,
there is a need for efficient parallel algorithms. We designed and implemented a distributed algorithm to identify frequent patterns in a
transactional database. Our approach is an extension of Han et. al.'s FP-Growth algorithm and it scales extremely well with additional processors.
The code was implemented in C using MPI and MPI-2 was used for
file handling. The results reported in the papers were generated on a
14-processor HP-9000/800 platform
A. Javed and A. Khokhar,
Frequent Pattern
Mining on Message Passing Multiprocessor Systems,
Distributed and Parallel Databases-An International Journal
(DAPD), November
2004. [ACM portal]
A. Javed and A. Khokhar, Scalable
Parallel Algorithm for Mining Frequent Patterns on Message Passing Parallel
Systems, ISCA Parallel and Distributed
Computing Systems (PDCS),
August 2003.
(The image is Diego
Rivera's Miners in Guerrero)