Current Project

  • Recursive Data Mining for Role Identification

    Abstract: Like paintings and verbal dialogues, every written document exhibits the author’s distinctive style. The identification of such style has many practical applications which include fraud detection, author attribution and user-centric personalization. The general task of finding distinctive features has much broader scientific implications that range from history to bioinformatics. In this paper, we focus on capturing patterns in electronic documents. The approach involves discovering patterns at varying degrees of abstraction, in a hierarchical fashion. The discovered patterns capture the stylistic characteristics of the author and are used as features to build efficient classifiers. Due to the nature of the pattern discovery process, we call our approach Recursive Data Mining. The patterns discovered allow for certain degree of approximation, which is necessary for capturing non-trivial patterns on realistic datasets. Experiments on the Enron dataset, which categorize members into organizational roles, are conducted to substantiate our methodology. The results show that a Naive Bayes classifier that use the dominant patterns discovered by Recursive Data Mining (including its 0 level tokens, the words of email messages) perform well in role detection.


  • Recursive Data Mining for Protein Family

    Abstract: Coming soon