Software
Software.Software History
Hide minor edits - Show changes to markup
Sampling Minimal Boolean Expressions (minDNF)
The minDNF method samples minimal boolean expressions in DNF.
- Download
Interesting Subspace Mining (SCHISM)
SCHISM finds interesting subspace clusters.
- Download
- Relevant Publications
- Karlton Sequeira and Mohammed J. Zaki, SCHISM: A New Approach for Interesting Subspace Mining. In 4th IEEE International Conference on Data Mining. Nov 2004. (PDF) (BibTeX)
- Karlton Sequeira and Mohammed J. Zaki, SCHISM: A New Approach to Interesting Subspace Mining. International Journal of Business Intelligence and Data Mining, 1(2):137-160. 2005. (PDF) (BibTeX)
\\\
Graph Pattern Sampling (MUSK)
Graph Pattern Sampling (Output Space Sampling)
- MUSK code (coming soon): for maximal patterns
- Graph Sampling code (coming soon): for all, support-biased and discriminative patterns
- Graph Sampling code: for all, support-biased and discriminative graph pattern sampling
- aRulesSequences (R): R package that contains the cSPADE code (Courtesy: Christian Buchta and Michael Hahsler, Vienna University of Economics and Business Administration).
- cSpade (win) code: same as above, but for the Windows platform (Courtesy: Daniel Diaz, University of Paris 1 - Pantheon Sorbonne).
- Utilities (win) code: same as above, but for the Windows platform (Courtesy: Daniel Diaz, University of Paris 1 - Pantheon Sorbonne).
- BLOSOM code: mug find minimal or-clauses, xng finds minimal CNF expressions and xug find minimal DNF expressions.
- BLOSOM code: mug find minimal or-clauses, and-clauses, CNF and DNF expressions, xng finds closed CNF expressions and xug find closed DNF expressions.
Graph Mining
Graph Mining & Indexing
[#grail]]
Scalable Graph Reachability Indexing (GRAIL)
GRAIL uses random multiple interval labelings, and a variety of optimizations to perform rapid reachability testing in very large graphs (with millions of nodes and edges).
- Download
- GRAIL code
- Relevant Publications
- Hilmi Yildirim, Vineet Chaoji and Mohammed J. Zaki, GRAIL: Scalable Reachability Index for Large Graphs. Proceedings of the VLDB Endowment (36th International Conference on Very Large Data Bases), 3(1):276-284. 2010. (PDF) (BibTeX)
\\\
- Invalid BibTex Entry!
- Mohammed J. Zaki, Christopher D. Carothers and Boleslaw K. Szymanski, VOGUE: A Variable Order Hidden Markov Model with Duration based on Frequent Sequence Mining. ACM Transactions on Knowledge Discovery in Data, 4(1):Article 5. Jan 2010. (PDF)[==] (BibTeX)
(:htoc start=1 end=2 class=htoneline:)
(:htoc start=1 end=2 class=htoneline:)
\\
(:htoc start=1 end=2:)
(:htoc start=1 end=2 class=htoneline:)
(:toc-back:)
Microarray Gene Expression Clustering
Triclusters and Biclusters (TriCluster and MicroCluster)
Microarray Gene Expression Clustering (TriCluster and MicroCluster)
Hidden Markov Models (HMM)
Biological Sequence Analysis
This section contains code for sequence modeling via Hidden Markov Models, code for structured motif extraction and search, and genome-scale disk-based suffix tree indexing.
Variable Order HMM with Duration (VOGUE)
Hidden Markov Models: Variable Order HMM with Duration (VOGUE)
(:toc-back:)
Genome Scale Indexing
Disk-based Suffix Trees (Trellis and Trellis+)
Genome Scale Indexing: Disk-based Suffix Trees (Trellis and Trellis+)
(:toc-back:)
Structured Sequence Motifs: Search and Extraction
sMotif and exMotif
Structured Sequence Motifs: Search and Extraction (sMotif and exMotif)
Protein Docking (ContextShapes)
Protein Docking and Partial Shape Matching (ContextShapes)
ContextShapes does rigid-body protein docking. It uses a novel contextshapes data structure to represent local surface regions/shapes on the protein. All critical points on both the receptor and ligand are represented via context shapes, and the best docking is found via pair-wise matching.
complex and informative patterns types such as: Itemsets, Sequences, Trees and Graphs.
complex and informative patterns types such as: Itemsets, Sequences, Trees and Graphs.
(:toc-back:)
(:toc-back:)
(:toc-back:)
(:toc-back:)
(:toc-back:)
Origami
Representative Orthogonal Graph Mining (Origami)
Graph Pattern Sampling
Graph Pattern Sampling (MUSK)
(:toc-back:)
This section contains code for mining categorical subspace clusters, shape based clusters, for clustering based on a lower bound on similarity, and a new outlier-based approach for initial cluster seed selection.
This section contains code for mining categorical subspace clusters, shape based clusters, for clustering based on a lower bound on similarity, and a new outlier-based approach for initial cluster seed selection.
CLICKS
Categorical Subspace Clustering (CLICKS)
Sparcl
Arbitrary Shape Clustering (Sparcl)
Robin
K-means Initialization (Robin)
(:toc-back:)
TriCluster and MicroCluster
Triclusters and Biclusters (TriCluster and MicroCluster)
(:toc-back:)
(:toc-back:)
Trellis and Trellis+
Disk-based Suffix Trees (Trellis and Trellis+)
(:toc-back:)
(:toc-back:)
(:toc-back:)
- [[Path:/~zaki/software/IBM-datagen.tar.gz | IBM Datagen program]: contains the IBM synthetic dataset generator for itemset patterns
- IBM Datagen program: contains the IBM synthetic dataset generator for itemset patterns
(:toc-back:)
VOGUE is a variable order and gapped HMM with with duration. It uses sequence mining to extract frequent patterns in the data. It then uses these patterns to build a variable order HMM with explicit duration on the gap states, for sequence modeling and classification.
VOGUE is a variable order and gapped HMM with with duration. It uses sequence mining to extract frequent patterns in the data. It then uses these patterns to build a variable order HMM with explicit duration on the gap states, for sequence modeling and classification. VOGUE was applied to model protein sequences, as well as a number of other sequence datasets including weblogs.
Genome Scale Indexing
Genome Scale Indexing
Protein Structure
Flexible and Non-sequential Protein Structure Alignment (SNAP/STSA and FlexSnap)
SNAP finds non-sequential 3D protein structure alignments. It was initially called STSA (2008-snap). FlexSnap allows the ability to find both flexible and non-sequential allignments.
Structured Sequence Motifs: Search and Extraction
sMotif and exMotif
sMotif and exMotif are two complementary for searching and extracting/mining structured sequence motifs DNA sequences. A structured motif consists of simple motifs separated by different gap lengths. The simple motif may be a simple pattern or a position weighted matrix or profile. Given a template structured motif (pattern or profile), sMotif finds all matches in a given set of sequences. On the other hand, exMotif mines novel motifs matching some minimal conditions on the gaps and frequency.
- exMotif code: for structured motif extraction
- sMotif code: for structured motif search
- Saeed Salem, Mohammed J. Zaki and Chris Bystroff, Iterative Non-Sequential Protein Structural Alignment. Journal of Bioinformatics and Computational Biology, 7(3):571-596. Jun 2009. (PDF) (BibTeX)
- Saeed Salem, Mohammed J. Zaki and Chris Bystroff, FlexSnap: Flexible Non-Sequential Protein Structure Alignment. In 9th Workshop on Algorithms in Bioinformatics. Sep 2009. (PDF) (BibTeX)
- Yongqiang Zhang and Mohammed J. Zaki, EXMOTIF: efficient structured motif extraction. Algorithms for molecular biology, 1(21). Nov 2006. ((URL)) (PDF) (BibTeX)
- Yongqiang Zhang and Mohammed J. Zaki, SMOTIF: efficient structured pattern and profile motif search. Algorithms for molecular biology, 1(22). Nov 2006. ((URL)) (PDF) (BibTeX)
Protein Docking (ContextShapes)
Protein Structure
Flexible and Non-sequential Protein Structure Alignment (SNAP/STSA and FlexSnap)
SNAP finds non-sequential 3D protein structure alignments. It was initially called STSA (2008-snap). FlexSnap allows the ability to find both flexible and non-sequential allignments.
- Zujun Shentu, Mohammad Al Hasan, Chris Bystroff and Mohammad J. Zaki, Context Shapes: Efficient Complementary Shape Matching for Protein-Protein Docking. Proteins: Structure, Function and Bioinformatics, 70(3):1056-1073. Feb 2008. (PDF)[==] (BibTeX)
- Saeed Salem, Mohammed J. Zaki and Chris Bystroff, Iterative Non-Sequential Protein Structural Alignment. Journal of Bioinformatics and Computational Biology, 7(3):571-596. Jun 2009. (PDF) (BibTeX)
- Saeed Salem, Mohammed J. Zaki and Chris Bystroff, FlexSnap: Flexible Non-Sequential Protein Structure Alignment. In 9th Workshop on Algorithms in Bioinformatics. Sep 2009. (PDF) (BibTeX)
Protein Indexing (PSIST)
PSIST uses suffix trees to index protein 3D structure. It first converts the 3D structure into a structure-feature sequence over a new structural alphabet, which is then used to index protein structures. The PSIST index makes it very fast to query for a matching structural fragment.
Protein Docking (ContextShapes)
- Feng Gao and Mohammed J. Zaki, PSIST: A Scalable Approach to Indexing Protein Structures using Suffix Trees. Journal of Parallel and Distributed Computing, 68(1):55-63. Jan 2008. (PDF)[==] (BibTeX)
- Zujun Shentu, Mohammad Al Hasan, Chris Bystroff and Mohammad J. Zaki, Context Shapes: Efficient Complementary Shape Matching for Protein-Protein Docking. Proteins: Structure, Function and Bioinformatics, 70(3):1056-1073. Feb 2008. (PDF)[==] (BibTeX)
Real and Synthetic Datasets
The section contains various synthetic and real datasets used in some of the papers related to itemset, sequence, tree and XML mining.
Protein Indexing (PSIST)
PSIST uses suffix trees to index protein 3D structure. It first converts the 3D structure into a structure-feature sequence over a new structural alphabet, which is then used to index protein structures. The PSIST index makes it very fast to query for a matching structural fragment.
- [[Path:/~zaki/software/IBM-datagen.tgz | IBM Datagen program]: contains the IBM synthetic dataset generator for itemset patterns.
- Tree Generator: contains the synthetic tree generator described in (2005-treeminer:tkde).
- Real Datasets: contains various real itemset datasets like chess, connect, mushroom, pumsb and so on, used in the papers on frequent, closed and maximal itemset mining.
- CSLOGS data: The CSLOGS data was used for (2005-treeminer:tkde).
- Xrules Log Data: The log data used for XML classification in (2006-xrules:mlj).
- Relevant Publications
- Feng Gao and Mohammed J. Zaki, PSIST: A Scalable Approach to Indexing Protein Structures using Suffix Trees. Journal of Parallel and Distributed Computing, 68(1):55-63. Jan 2008. (PDF) (BibTeX)
Protein Folding Pathways (UNFOLD)
UNFOLD uses a recursive min-cut on a weighted secondary structure element graph to predict the sequence of protein (un)folding events.
- Download
- Relevant Publications
- Mohammed J. Zaki, Vinay Nadimpally, Deb Bardhan and Chris Bystroff, Predicting protein folding pathways. Bioinformatics, 20(1):i386-i393. Aug 2004. (PDF) (BibTeX)
Real and Synthetic Datasets
The section contains various synthetic and real datasets used in some of the papers related to itemset, sequence, tree and XML mining.
- Download
- [[Path:/~zaki/software/IBM-datagen.tar.gz | IBM Datagen program]: contains the IBM synthetic dataset generator for itemset patterns
- Tree Generator: contains the synthetic tree generator described in (2005-treeminer:tkde)
- Real Datasets: contains various real itemset datasets like chess, connect, mushroom, pumsb and so on, used in the papers on frequent, closed and maximal itemset mining
- CSLOGS data: The CSLOGS data was used for (2005-treeminer:tkde)
- Xrules Log Data: The log data used for XML classification in (2006-xrules:mlj)
- Xrules synthetic datasets: The synthetic classification data used for XML classification in (2006-xrules:mlj)
- Plan dataset: Planning dataset for sequence mining
Frequent Boolean Expressions (BLOSOM)
The BLOSOM framework allows one to mine arbitrary frequent boolean expressions include AND clauses (itemsets), OR clauses, and CNF/DNF expressions. It focuses on mining the minimal boolean expressions.
- Utilities code: provides exttpose, getconf, and makebin
- BLOSOM code: mug find minimal or-clauses, xng finds minimal CNF expressions and xug find minimal DNF expressions.
- Charm-L: this code can mine all minimal and closed and-clauses, i.e., all minimal and closed frequent itemsets.
- Relevant Publications
- Lizhuang Zhao, Mohammed J. Zaki and Naren Ramakrishnan, BLOSOM: A Framework for Mining Arbitrary Boolean Expressions. In 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Aug 2006. (PDF) (BibTeX)
Frequent Boolean Expressions (BLOSOM)
The BLOSOM framework allows one to mine arbitrary frequent boolean expressions include AND clauses (itemsets), OR clauses, and CNF/DNF expressions. It focuses on mining the minimal boolean expressions.
- Download
- BLOSOM code: mug find minimal or-clauses, xng finds minimal CNF expressions and xug find minimal DNF expressions.
- Charm-L: this code can mine all minimal and closed and-clauses, i.e., all minimal and closed frequent itemsets.
- Relevant Publications
- Lizhuang Zhao, Mohammed J. Zaki and Naren Ramakrishnan, BLOSOM: A Framework for Mining Arbitrary Boolean Expressions. In 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Aug 2006. (PDF) (BibTeX)
Tree Mining
Itemset and Sequence Utilities
Provides the utilities needed for Eclat, Charm/Charm-L, GenMax, Spade and cSpade.
- Download
- Utilities code: provides exttpose, getconf, and makebin
Tree Mining
Datasets
Real and Synthetic Datasets
The section contains various synthetic and real datasets used in some of the papers related to itemset, sequence, tree and XML mining.
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
- [[Path:/~zaki/software/IBM-datagen.tgz | IBM Datagen program]: contains the IBM synthetic dataset generator for itemset patterns.
- Tree Generator: contains the synthetic tree generator described in (2005-treeminer:tkde).
- Real Datasets: contains various real itemset datasets like chess, connect, mushroom, pumsb and so on, used in the papers on frequent, closed and maximal itemset mining.
- CSLOGS data: The CSLOGS data was used for (2005-treeminer:tkde).
- Xrules Log Data: The log data used for XML classification in (2006-xrules:mlj).
Frequent Boolean Expressions (BLOSOM)
The BLOSOM framework allows one to mine arbitrary frequent boolean expressions include AND clauses (itemsets), OR clauses, and CNF/DNF expressions. It focuses on mining the minimal boolean expressions.
Itemset and Sequence Utilities
- BLOSOM code: mug find minimal or-clauses, xng finds minimal CNF expressions and xug find minimal DNF expressions.
- Charm-L: this code can mine all minimal and closed and-clauses, i.e., all minimal and closed frequent itemsets.
- Relevant Publications
- Lizhuang Zhao, Mohammed J. Zaki and Naren Ramakrishnan, BLOSOM: A Framework for Mining Arbitrary Boolean Expressions. In 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Aug 2006. (PDF) (BibTeX)
- Utilities code: provides exttpose, getconf, and makebin
Frequent Boolean Expressions (BLOSOM)
The BLOSOM framework allows one to mine arbitrary frequent boolean expressions include AND clauses (itemsets), OR clauses, and CNF/DNF expressions. It focuses on mining the minimal boolean expressions.
- Download
- BLOSOM code: mug find minimal or-clauses, xng finds minimal CNF expressions and xug find minimal DNF expressions.
- Charm-L: this code can mine all minimal and closed and-clauses, i.e., all minimal and closed frequent itemsets.
- Relevant Publications
- Lizhuang Zhao, Mohammed J. Zaki and Naren Ramakrishnan, BLOSOM: A Framework for Mining Arbitrary Boolean Expressions. In 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Aug 2006. (PDF) (BibTeX)
Graph Mining
Graph Mining
Origami
Origami
Graph Pattern Sampling
Graph Pattern Sampling
TriCluster
TriCluster and MicroCluster
Tricluster is the first tri-clustering algorithm for microarray expression clustering. It builds upon the new microCluster bi-clustering approach. Tricluster first mines all the bi-clusters across the gene-sample slices, and then it extends these into tri-clusters across time or space (depending on the third dimension). It can find both scaling and shifting patterns.
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
- Lizhuang Zhao and Mohammed J. Zaki, TriCluster: An Effective Algorithm for Mining Coherent Clusters in 3D Microarray Data. In ACM SIGMOD Conference on Management of Data. Jun 2005. (PDF) (BibTeX)
- Lizhuang Zhao and Mohammed J. Zaki, MicroCluster: An Efficient Deterministic Biclustering Algorithm for Microarray Data. IEEE Intelligent Systems, 20(6):40-49. Nov/Dec 2005. (PDF) (BibTeX)
MicroCluster
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
Hidden Markov Models
VOGUE
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
Genome Scale Indexing
Trellis
Hidden Markov Models (HMM)
Variable Order HMM with Duration (VOGUE)
VOGUE is a variable order and gapped HMM with with duration. It uses sequence mining to extract frequent patterns in the data. It then uses these patterns to build a variable order HMM with explicit duration on the gap states, for sequence modeling and classification.
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
- Invalid BibTex Entry!
Protein Structure Alignment
Snap
Genome Scale Indexing
Trellis and Trellis+
Trellis is a disk-based suffix tree indexing methods (with suffix links) that is capable of indexing the entire human genome on a commodity PC with limited memory. Trellis+ extends Trellis by further removing some memory limitations by using a novel guide suffix tree in memory.
- Trellis code: input sequence must be in memory
- Trellis+ code: removes all memory limitations
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
- Benjarath Phoophakdee and Mohammed J. Zaki, Genome-scale Disk-based Suffix Tree Indexing. In ACM SIGMOD International Conference on Management of Data. Jun 2007. (PDF) (BibTeX)
- Benjarath Phoophakdee and Mohammed J. Zaki, TRELLIS+: An Effective Approach for Indexing Genome-Scale Sequences using Suffix Trees. In 13th Pacific Symposium on Biocomputing. Jan 2008. (PDF) (BibTeX)
FlexSnap
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
Protein Docking
ContextShapes
Protein Structure
Flexible and Non-sequential Protein Structure Alignment (SNAP/STSA and FlexSnap)
SNAP finds non-sequential 3D protein structure alignments. It was initially called STSA (2008-snap). FlexSnap allows the ability to find both flexible and non-sequential allignments.
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
- Saeed Salem, Mohammed J. Zaki and Chris Bystroff, Iterative Non-Sequential Protein Structural Alignment. Journal of Bioinformatics and Computational Biology, 7(3):571-596. Jun 2009. (PDF) (BibTeX)
- Saeed Salem, Mohammed J. Zaki and Chris Bystroff, FlexSnap: Flexible Non-Sequential Protein Structure Alignment. In 9th Workshop on Algorithms in Bioinformatics. Sep 2009. (PDF) (BibTeX)
Protein Indexing
PSIST
Protein Docking (ContextShapes)
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
- Zujun Shentu, Mohammad Al Hasan, Chris Bystroff and Mohammad J. Zaki, Context Shapes: Efficient Complementary Shape Matching for Protein-Protein Docking. Proteins: Structure, Function and Bioinformatics, 70(3):1056-1073. Feb 2008. (PDF)[==] (BibTeX)
Utilities
Protein Indexing (PSIST)
PSIST uses suffix trees to index protein 3D structure. It first converts the 3D structure into a structure-feature sequence over a new structural alphabet, which is then used to index protein structures. The PSIST index makes it very fast to query for a matching structural fragment.
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
- Feng Gao and Mohammed J. Zaki, PSIST: A Scalable Approach to Indexing Protein Structures using Suffix Trees. Journal of Parallel and Distributed Computing, 68(1):55-63. Jan 2008. (PDF)[==] (BibTeX)
\\
Eclat
Frequent Itemsets (Eclat)
Charm, Charm-L and Non-redundant Rule Generation
Closed Itemsets (Charm, Charm-L) and Non-redundant Rule Generation
GenMax
Maximal Itemsets (GenMax)
Frequent Boolean Expressions (BLOSOM)
The BLOSOM framework allows one to mine arbitrary frequent boolean expressions include AND clauses (itemsets), OR clauses, and CNF/DNF expressions. It focuses on mining the minimal boolean expressions.
- Download
- BLOSOM code: mug find minimal or-clauses, xng finds minimal CNF expressions and xug find minimal DNF expressions.
- Charm-L: this code can mine all minimal and closed and-clauses, i.e., all minimal and closed frequent itemsets.
- Relevant Publications
- Lizhuang Zhao, Mohammed J. Zaki and Naren Ramakrishnan, BLOSOM: A Framework for Mining Arbitrary Boolean Expressions. In 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Aug 2006. (PDF) (BibTeX)
Spade
Frequent Sequences (Spade)
cSpade
Sequence Constraints (cSpade)
SLEUTH
SLEUTH
XML Classification (XRules)
XRules uses frequent tree mining to mine discriminative patterns from tree-structured XML documents.
- Download
- Relevant Publications
- Mohammed J. Zaki and Charu C. Aggarwal, Xrules: An effective structural classifier for XML data. Machine Learning Journal, 62(1-2):137-170. Feb 2006. (PDF) (BibTeX)
---
Graph Mining
Graph Mining
- MUSK code: for maximal patterns
- Graph Sampling code: for all, support-biased and discriminative patterns
- MUSK code (coming soon): for maximal patterns
- Graph Sampling code (coming soon): for all, support-biased and discriminative patterns
Boolean Expression Mining
BLOSOM
Clustering
This section contains code for mining categorical subspace clusters, shape based clusters, for clustering based on a lower bound on similarity, and a new outlier-based approach for initial cluster seed selection.
CLICKS
CLICKS finds subspace clusters in categorical data using a k-partite clique mining approach.
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
- Mohammed J. Zaki, Markus Peters, Ira Assent and Thomas Seidl, CLICKS: An Effective Algorithm for Mining Subspace Clusters in Categorical Datasets. Data and Knowledge Engineering, 60(1):51-70. Jan 2007. (PDF)[==] (BibTeX)
Clustering
CLICKS
Sparcl
Sparcl finds shape-based clusters. It uses a two step approach: in the first step we select a relatively large number of candidate centroids (via ROBIN) to find seed clusters via the K-means algorithm and in the second step we use a novel similarity kernel to merge the initial seed clusters to yield the final arbitrary shaped clusters.
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
- Vineet Chaoji, Mohammad Al Hasan, Saeed Salem and Mohammed J. Zaki, SPARCL: An Effective and Efficient Algorithm for Mining Arbitrary Shape-based Clusters. Knowledge and Information Systems, 21(2):201-229. Nov 2009. (PDF)[==] (BibTeX)
Sparcl
Robin
ROBIN uses a new local outllier factor based initial seed selection to improve k-means style clustering.
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
- Mohammad Al Hasan, Vineet Chaoji, Saeed Salem and Mohammed J. Zaki, Robust Partitional Clustering by Outlier and Density Insensitive Seeding. Pattern Recognition Letters, 30(11):994-1002. Aug 2009. (PDF)[==] (BibTeX)
Robin
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
XML Classification
Xrules
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
SLEUTH extends the TreeMiner methodology to mine all frequent embedded or induced as well as ordered or unordered tree patterns.
SLEUTH extends the TreeMiner methodology to mine all frequent embedded or induced as well as ordered or unordered tree patterns.
---
---
Origami uses random walks over the graph partial order to mine a representative sample of the maximal frequent subgraph patterns. It next selects only the orthogonal representative patterns via a local clique-finding method.
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
- Vineet Chaoji, Mohammad Al Hasan, Saeed Salem , Jeremy Besson and Mohammed J. Zaki, ORIGAMI: A Novel and Effective Approach for Mining Representative Orthogonal Graph Patterns. Statistical Analysis and Data Mining, 1(2):67-84. Jun 2008. (PDF)[==] (BibTeX)
Musk
Graph Pattern Sampling
Whereas Origami can mine a sample of maximal graph patterns it does not provide any uniformity guarantee. MUSK (2009-musk) proposes a Markov Chain Monte Carlo based approach to guarantee a uniform sample of all maximal patterns. The MCMC approach was further extended in (2009-graphsampling) to mine a sample of all frequent patterns, to mine support-biased patterns and also to mine a sample of discriminative patterns.
- MUSK code: for maximal patterns
- Graph Sampling code: for all, support-biased and discriminative patterns
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
- Mohammad Al Hasan and Mohammed J. Zaki, MUSK: Uniform Sampling of k Maximal Patterns. In 9th SIAM International Conference on Data Mining. Apr 2009. (PDF) (BibTeX)
- Mohammad Al Hasan and Mohammed J. Zaki, Output Space Sampling for Graph Patterns. Proceedings of the VLDB Endowment (35th International Conference on Very Large Data Bases), 2(1):730-741. 2009. (PDF) (BibTeX)
OSS
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
\\
Treeminer uses the vertical scope-list representation to mine frequent sequences {[2005-treeminer:tkde]}. Treeminer counts all embeddings, whereas Treeminer-D counts only distinct occurrences, which can be more appropriate for some datasets.
Treeminer uses the vertical scope-list representation to mine frequent sequences (2005-treeminer:tkde). Treeminer counts all embeddings, whereas Treeminer-D counts only distinct occurrences, which can be more appropriate for some datasets.
- TreeMiner code: vtreeminer is the TreeMiner code, whereas htreeminer is the PatternMatcher code as mentioned in the paper below.
SLEUTH extends the TreeMiner methodology to mine all frequent embedded or induced as well as ordered or unordered tree patterns.
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
- Mohammed J. Zaki, Efficiently Mining Frequent Embedded Unordered Trees. Fundamenta Informaticae, 66(1-2):33-52. Mar/Apr 2005. (PDF)[==] (BibTeX)
- You also need to download utilities like getconf, exttpose from Utilities section.
TreeMiner
TreeMiner and TreeMiner-D
Treeminer uses the vertical scope-list representation to mine frequent sequences {[2005-treeminer:tkde]}. Treeminer counts all embeddings, whereas Treeminer-D counts only distinct occurrences, which can be more appropriate for some datasets.
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
TreeMiner-D
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
- Mohammed J. Zaki, Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications. IEEE Transactions on Knowledge and Data Engineering, 17(8):1021-1035. Aug 2005. (PDF)[==] (BibTeX)
Spade uses the vertical format for mining the set of all frequent sequences from a dataset of many sequences. It also mines the frequent sequences of itemsets.
- Spade code
- You also need to download utilities like getconf, exttpose from Utilities section.
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
- Mohammed J. Zaki, SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal, 42(1/2):31-60. Jan/Feb 2001. (PDF)[==] (BibTeX)]
cSpade mines constrained frequent sequences. The constraints can take the form of length or width limitations on the sequences, minimum or maximum gap constraints on consecutive sequence elements, applying a time window on allowable sequences, incorporating item constraints, and finding sequences predictive of one or more classes. The class specific sequences can be used for sequence classification as described in {[2000-featuremine:is]}.
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
- Mohammed J. Zaki, Sequences Mining in Categorical Domains: Incorporating Constraints. In 9th ACM International Conference on Information and Knowledge Management. Nov 2000. (PDF)[==] (BibTeX)
- You also need to download utilities like getconf, exttpose from Utilities section.
- You also need to download utilities like getconf, exttpose from Utilities section.
- You also need to download utilities like getconf, exttpose from Utilities section.
- You also need to download utilities like getconf, exttpose from Utilities section.
- You also need to download utilities like getconf, exttpose from Utilities section.
- You also need to download utilities like getconf, exttpose from Utilities section.
- Eclat code
- You also need to download utilities like getconf, exttpose from Utilities section.
Charm-L adds the ability to construct the entire frequent concept lattice, that is, it adds the links
Charm-L adds the ability to construct the entire frequent concept lattice (also called as the iceberg lattice), that is, it adds the links
- You also need to download utilities like getconf, exttpose from Utilities section.
Genmax mines all maximal frequent itemsets via a backtracking approach with progressive focusing.
- Download
- GenMax Code
- You also need to download utilities like getconf, exttpose from Utilities section.
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
---
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
BLOSOM
BLOSOM
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
CLICKS
Sparcl
Robin
CLICKS
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
Sparcl
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
Robin
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
Xrules
Xrules
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
TriCluster
MicroCluster
TriCluster
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
MicroCluster
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
VOGUE
VOGUE
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
Trellis
Trellis
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
Snap
FlexSnap
Snap
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
FlexSnap
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
ContextShapes
ContextShapes
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
PSIST
PSIST
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
Datasets
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
Datasets
- Download
- Relevant Publications
- Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
DMTL implements all algorithms following the vertical data mining approach. For itemsets, the implementation follows the Eclat approach (without diffsets). For sequences it follows the Spade approach. For trees it implements the SLEUTH framework, which allows one to mine embedded/induced and ordered/unordered trees. Finally, the graph mining framework uses a novel vertical approach.
DMTL implements all algorithms following the vertical data mining approach. For itemsets, the implementation follows the Eclat approach (without diffsets). For sequences it follows the Spade approach. For trees it implements the SLEUTH framework, which allows one to mine embedded/induced and ordered/unordered trees. Finally, the graph mining framework uses a novel vertical approach.
Itemset Mining
The section contains the code for mining all frequent itemsets, all closed itemsets, and all maximal frequent itemsets.
Itemset Mining and Association Rules
The section contains the code for mining all frequent itemsets, all closed itemsets, and all maximal frequent itemsets. It also includes the code for constructing the concept lattice (or iceberg lattice) and generating nonredundant association rules.
(1997-eclat), combined with the diffsets improvement (2003-diffsets). The code also contains the maxeclat, clique, and maxclique approaches mentioned in
(1997-eclat), combined with the diffsets improvement (2003-diffsets). The code also contains the maxeclat, clique, and maxclique approaches mentioned in
Charm mines all the frequent closed itemsets as described in {[]}.
Charm mines all the frequent closed itemsets as described in (2002-charm). Charm-L adds the ability to construct the entire frequent concept lattice, that is, it adds the links between all sub/super-concepts (or closed itemsets) (2005-charm:tkde). This ability is used to mine the non-redundant association rules (2004-nonredundant:dmkd).
- Download
- Relevant Publications
- Mohammed J. Zaki and Ching-Jui Hsiao, Efficient Algorithms for Mining Closed Itemsets and their Lattice Structure. IEEE Transactions on Knowledge and Data Engineering, 17(4):462-478. Apr 2005. (PDF) (BibTeX)
- Mohammed J. Zaki, Mining Non-Redundant Association Rules. Data Mining and Knowledge Discovery: An International Journal, 9(3):223-248. Nov 2004. (PDF) (BibTeX)
- Mohammed J. Zaki and Karam Gouda, Fast Vertical Mining Using Diffsets. In 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Aug 2003. (PDF)[==] (BibTeX)
- Mohammed J. Zaki and Karam Gouda, Fast Vertical Mining Using Diffsets. In 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Aug 2003. (PDF) (BibTeX)
Charm
Charm-L
Charm, Charm-L and Non-redundant Rule Generation
Charm mines all the frequent closed itemsets as described in {[]}.
- Invalid BibTex Entry!
- Mohammed J. Zaki, Scalable Algorithms for Association Mining. IEEE Transactions on Knowledge and Data Engineering, 12(3):372-390. May/Jun 2000. (PDF)[==] (BibTeX)
- Vineet Chaoji, Mohammad Al Hasan, Saeed Salem and Mohammed J. Zaki, An integrated, generic approach to pattern mining: data mining template library. Data Mining and Knowledge Discovery, 17(3):457-495. Dec 2008. (PDF)[==] (BibTeX)
- Invalid BibTex Entry!
- Mohammed J. Zaki and Karam Gouda, Fast Vertical Mining Using Diffsets. In 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Aug 2003. (PDF) (BibTeX)
Eclat uses the original vertical tidset approach for mining all frequent itemsets Invalid BibTex Entry!, combined with the diffsets improvement Invalid BibTex Entry!. The code also contains the maxeclat, clique, and maxclique approaches mentioned in Invalid BibTex Entry!.
Eclat uses the original vertical tidset approach for mining all frequent itemsets (1997-eclat), combined with the diffsets improvement (2003-diffsets). The code also contains the maxeclat, clique, and maxclique approaches mentioned in (2000-eclat:tkde).
Spade
cSpade
TreeMiner
TreeMiner-D
SLEUTH
DMTL implements all algorithms following the vertical data mining approach. For itemsets, the implementation follows the Eclat approach (without diffsets). For sequences it follows the Spade approach. For trees it implements the SLEUTH framework, which allows one to mine embedded/induced and ordered/unordered trees. Finally, the graph mining framework uses a novel vertical approach.
Itemset Mining
Eclat
Charm
Charm-L
GenMax
Itemset Mining
The section contains the code for mining all frequent itemsets, all closed itemsets, and all maximal frequent itemsets.
Eclat
Eclat uses the original vertical tidset approach for mining all frequent itemsets Invalid BibTex Entry!, combined with the diffsets improvement Invalid BibTex Entry!. The code also contains the maxeclat, clique, and maxclique approaches mentioned in Invalid BibTex Entry!.
- Download
- Relevant Publications
- Vineet Chaoji, Mohammad Al Hasan, Saeed Salem and Mohammed J. Zaki, An integrated, generic approach to pattern mining: data mining template library. Data Mining and Knowledge Discovery, 17(3):457-495. Dec 2008. (PDF) (BibTeX)
Charm
Charm-L
GenMax
Acknowledgments: We gratefully acknowledge the funding from the following agencies/programs that made the research possible:
Acknowledgments
We gratefully acknowledge the funding from the following agencies/programs that made the research possible:
(:toc-page Software/Software self=1:)
(:*toc:)
complex and informative patterns types such as: Itemsets, Sequences, Trees and Graphs.
complex and informative patterns types such as: Itemsets, Sequences, Trees and Graphs.
Download:
- Sourceforge Project Page
- Sourceforge Download Page
Relevant Publications
- Vineet Chaoji, Mohammad Al Hasan, Saeed Salem and Mohammed J. Zaki, An integrated, generic approach to pattern mining: data mining template library. Data Mining and Knowledge Discovery, 17(3):457-495. Dec 2008. (PDF) (BibTeX)
- Download
- Sourceforge Project Page
- Sourceforge Download Page
- Relevant Publications
- Vineet Chaoji, Mohammad Al Hasan, Saeed Salem and Mohammed J. Zaki, An integrated, generic approach to pattern mining: data mining template library. Data Mining and Knowledge Discovery, 17(3):457-495. Dec 2008. (PDF) (BibTeX)
(:*toc:)
Acknowledgments
We gratefully acknowledge the funding from the following agencies that made the research possible: National Science Foundation -- Information and Data Management program (IIS-0092978), and Next Generation Software program (EIA-0103708); and Department of Energy, Office of Science (DE-FG02-02ER25538)
Acknowledgments: We gratefully acknowledge the funding from the following agencies/programs that made the research possible:
- NSF Information and Data Management program (Grant: IIS-0092978)
- NSF Next Generation Software program (Grant: EIA-0103708)
- DOE Office of Science (Grant: DE-FG02-02ER25538)
- CIA/NSA/DNI/IARPA Knowledge Discovery and Dissemination Program (Grants: EIA-0225715, ACI-0342411, CNS-0332960, CNS-0422637, CNS-0540232, IIS-0830218)
- NSF Emerging Models and Technologies for Computation program (Grant: EMT-0829835)
- NIH Biomedical Imaging and Bioengineering Institute (Grant: 1R01EB0080161-02)
(:toc-page Software/Software self=1:)
(:*toc-float Table of Contents:) Disclaimer: The software is provided on an as is basis for research purposes. There is no additional support offered, nor are the author(s) or their institutions liable under any circumstances.
Disclaimer: The software on this page is provided on an as is basis for research purposes. There is no additional support offered, nor are the author(s) or their institutions liable under any circumstances.
(:*toc:)
Generic Pattern Mining
Data Mining Template Library
ItemsetMining
Acknowledgments
We gratefully acknowledge the funding from the following agencies that made the research possible: National Science Foundation -- Information and Data Management program (IIS-0092978), and Next Generation Software program (EIA-0103708); and Department of Energy, Office of Science (DE-FG02-02ER25538)
Data Mining Template Library (DMTL)
DMTL is an open-source, high-performance, generic data mining toolkit, written in C++. It provides a collection of generic algorithms and data structures for mining increasingly complex and informative patterns types such as: Itemsets, Sequences, Trees and Graphs.
DMTL utilizes a generic data mining approach, where all aspects of mining are controlled via a set of properties. The kind of pattern to be mined, the kind of mining approach to use, and the kind of data types and formats to mine over are all specified as a list of properties. This provides tremendous flexibility to customize the toolkit for various applications.
Download:
- Sourceforge Project Page
- Sourceforge Download Page
Relevant Publications
- Vineet Chaoji, Mohammad Al Hasan, Saeed Salem and Mohammed J. Zaki, An integrated, generic approach to pattern mining: data mining template library. Data Mining and Knowledge Discovery, 17(3):457-495. Dec 2008. (PDF) (BibTeX)
Itemset Mining
(:*toc-float:)
(:*toc-float Table of Contents:)
Charm-L
TreeMiner-D
Clusering
Boolean Expression Mining
BLOSOM
Clustering
CLICKS
XML Classification
Xrules
Microarray Gene Expression Clustering
TriCluster
MicroCluster
PSIST
PSIST
Utilities
Datasets
(:*toc-float:)
List of Software
(:*toc:)
Disclaimer: The software is provided on an as is basis for research purposes. There is no additional support offered, nor are the author(s) or their institutions liable under any circumstances.
List of Software
(:*toc:)


