My work is centered around data mining, the science of discovering latent and useful knowledge in large databases (also known as KDD - Knowledge Discovery and Data mining). As part of the group (Data Mining Template Library - DMTL, see Projects below) under Prof. Zaki, I am currently working on frequent pattern mining (FPM). FPM is a category of problems in data mining that deals with finding frequent patterns (viz. itemsets, sequences, trees or graphs) in massive relational databases. Unfortunately FPM (even data mining for that matter) is not well documented - largely because data mining is a relatively new field which is witnessing rapid new developments. Some useful pointers on data mining:

Frequent patterns aren't always "interesting", but that would be the focus of another problem. Our group is working towards developing a unified and generic framework for FPM. We currently have efficient and functional modules for itemsets, sequences, trees and graphs as part of our library. My work in the group has largely revolved around tree and graph mining (see references below for relevant papers). Graph mining is particularly challenging because of the inherent complexity of graphs, raising issues like graph isomorphism (a problem known to be NP, but not proved to be NP-complete). We aim to have an open source release of our library soon - watch this space!

References - (please also check the publications section)


Projects