I am broadly interested in algorithmic and applied aspects of data mining and machine learning. Specifically, I have worked on algorithms for mining structural patterns, machine learning and graph theoretic approaches to link analysis/prediction. I am also interested in algorithms and techniques for text classification.

Within pattern mining, I am interested in algorithms for mining closed, maximal and approximate patterns. Along with the practical aspects of pattern mining, I am also interested in formal concepts related to these patterns. As an initial effort in this direction, I have been a key contributor of the Data Mining Template Library (http://dmtl.sourceforge.net/). The library provides a generic framework for mining a wide range of frequent patterns. The library has been developed to handle datasets that can be as large as 60GB in size. Along another direction in pattern mining, I am currently building a framework for addressing the authorship attribution problem. The idea is to identify discriminating sequence patterns in text documents. The patterns found in a document form the signature for that document. Documents having similar signatures are said to have the same authors.

I am also interested in social network analysis and link prediction problems applied to various kinds of network structures - authorship or citation graphs. In a previous project, we applied supervised learning and Markov methods to the task of predicting links in a citation and co-authorship networks. I am currently involved in a project that aims at discovering new interactions in protein-protein interaction networks.

As part of my Master's thesis, I worked on a feature splitting technique under the co-training setting proposed earlier by Blum and Mitchell. The co-training setting proposes a semi-supervised learning algorithm for boosting a classifier in the presence of a small amount of training data. During my Master's again, I have worked on unsupervised learning techniques for detecting novel events in video stream.