SPIDER: Scalable, Parallel and Interactive Data Mining and
Exploration at Rensselaer
The goal of our research is to develop a high
performance data mining system (HPDM), which can manipulate very large
scientific databases. The research pursues an application-oriented
approach with special focus on bioinformatics (e.g., protein structure
prediction). The HPDM system is based on a three-tiered
architecture consisting of a front-end interface, visualization, and
query tool, a middle layer Data Mining Template Library of common
high-level mining algorithms and a core set of data mining "primitive
operations'', and a back-end Extensible Data Mining System tightly
integrated with a database system, and delivering high performance.
Research
Our ultimate goal is to develop a fully functional
HPDM toolkit for massive databases. We are exploring
and unifying two dominant frameworks:
- Develop a Data Mining Template Library (analogous to
Standard Templatre Library in C++)
- Develop an Extensible Data Mining Sever (analogous
to a database management system)
Our current
accomplishments include:
- Completed first prototype of our general-purpose Data Mining
Template Library (DMTL).
- The DMTL library currently supports frequent pattern mining
algorithms like Itemsets, Sequences, Trees and Graphs.
- DMTL supports persistency of mined patterns and temporary results.
Current Students
Mohammed AlHasan
Vineet Chaoji
Saeed Salem
Past Students/Contributors
Adnan Saifee
Nagender Parimi
Joe Urban
Paolo Palmerini
Nilanjana De
Benjarath Phoophakdee
Feng Gao
Jeevan Pathuri
SPIDER Related Links
Papers
on Data Mining
Introduction
to Data Mining Course
Open Source Software