Software

Software.Software History

Show minor edits - Show changes to markup

May 18, 2012, at 04:03 PM by 128.113.126.13 -
Added lines 138-148:

Sampling Minimal Boolean Expressions (minDNF)

The minDNF method samples minimal boolean expressions in DNF.

February 21, 2012, at 10:39 PM by 128.113.126.13 -
Added lines 315-329:

Interesting Subspace Mining (SCHISM)

SCHISM finds interesting subspace clusters.

  • Download
  • Relevant Publications
    • Karlton Sequeira and Mohammed J. Zaki, SCHISM: A New Approach for Interesting Subspace Mining. In 4th IEEE International Conference on Data Mining. Nov 2004. (PDF) (BibTeX)
    • Karlton Sequeira and Mohammed J. Zaki, SCHISM: A New Approach to Interesting Subspace Mining. International Journal of Business Intelligence and Data Mining, 1(2):137-160. 2005. (PDF) (BibTeX)

\\\

January 30, 2012, at 09:20 PM by 128.113.126.13 -
Changed lines 276-277 from:
Graph Pattern Sampling (MUSK)
to:
Graph Pattern Sampling (Output Space Sampling)
Changed lines 282-283 from:
to:
January 08, 2012, at 06:01 PM by 128.113.126.13 -
Added line 177:
  • aRulesSequences (R): R package that contains the cSPADE code (Courtesy: Christian Buchta and Michael Hahsler, Vienna University of Economics and Business Administration).
January 08, 2012, at 05:58 PM by 128.113.126.13 -
Added line 176:
  • cSpade (win) code: same as above, but for the Windows platform (Courtesy: Daniel Diaz, University of Paris 1 - Pantheon Sorbonne).
Added line 196:
  • Utilities (win) code: same as above, but for the Windows platform (Courtesy: Daniel Diaz, University of Paris 1 - Pantheon Sorbonne).
October 10, 2011, at 04:07 PM by 128.113.126.13 -
Changed line 132 from:
  • BLOSOM code: mug find minimal or-clauses, xng finds minimal CNF expressions and xug find minimal DNF expressions.
to:
  • BLOSOM code: mug find minimal or-clauses, and-clauses, CNF and DNF expressions, xng finds closed CNF expressions and xug find closed DNF expressions.
April 19, 2011, at 01:13 PM by 128.113.126.13 -
Changed line 255 from:

Graph Mining

to:

Graph Mining & Indexing

Added lines 286-300:



[#grail]]

Scalable Graph Reachability Indexing (GRAIL)

GRAIL uses random multiple interval labelings, and a variety of optimizations to perform rapid reachability testing in very large graphs (with millions of nodes and edges).

  • Download
    • GRAIL code
  • Relevant Publications
    • Hilmi Yildirim, Vineet Chaoji and Mohammed J. Zaki, GRAIL: Scalable Reachability Index for Large Graphs. Proceedings of the VLDB Endowment (36th International Conference on Very Large Data Bases), 3(1):276-284. 2010. (PDF) (BibTeX)

\\\

April 19, 2011, at 01:09 PM by 128.113.126.13 -
Added line 16:
  • HP Innovation Research Program
Changed line 372 from:
to:
Changed line 375 from:
  • Invalid BibTex Entry!
to:
  • Mohammed J. Zaki, Christopher D. Carothers and Boleslaw K. Szymanski, VOGUE: A Variable Order Hidden Markov Model with Duration based on Frequent Sequence Mining. ACM Transactions on Knowledge Discovery in Data, 4(1):Article 5. Jan 2010. (PDF)[==] (BibTeX)
Deleted line 23:
Deleted line 24:
Added line 146:

Added line 204:

Added line 255:

Added line 294:

Added line 362:

Added line 418:

Added line 481:

Added line 483:

Deleted line 25:
Added lines 23-24:

Added lines 27-28:

Deleted lines 4-5:

(:htoc start=1 end=2 class=htoneline:)

Added lines 22-26:

(:htoc start=1 end=2 class=htoneline:)


\\

Deleted lines 0-1:

(:htoc start=1 end=2:)

Added lines 5-6:

(:htoc start=1 end=2 class=htoneline:)

Deleted line 23:
Added lines 1-2:

(:htoc start=1 end=2:)

Changed lines 328-337 from:

(:toc-back:)



Microarray Gene Expression Clustering


to:



Changed lines 331-332 from:
Triclusters and Biclusters (TriCluster and MicroCluster)
to:
Microarray Gene Expression Clustering (TriCluster and MicroCluster)
Changed lines 351-353 from:

Hidden Markov Models (HMM)


to:

Biological Sequence Analysis

This section contains code for sequence modeling via Hidden Markov Models, code for structured motif extraction and search, and genome-scale disk-based suffix tree indexing.

Changed lines 356-357 from:
Variable Order HMM with Duration (VOGUE)
to:
Hidden Markov Models: Variable Order HMM with Duration (VOGUE)
Changed lines 367-378 from:

(:toc-back:)



Genome Scale Indexing


Disk-based Suffix Trees (Trellis and Trellis+)
to:



Genome Scale Indexing: Disk-based Suffix Trees (Trellis and Trellis+)
Changed lines 383-393 from:

(:toc-back:)



Structured Sequence Motifs: Search and Extraction


sMotif and exMotif
to:



Structured Sequence Motifs: Search and Extraction (sMotif and exMotif)
Changed lines 447-450 from:
Protein Docking (ContextShapes)
to:
Protein Docking and Partial Shape Matching (ContextShapes)

ContextShapes does rigid-body protein docking. It uses a novel contextshapes data structure to represent local surface regions/shapes on the protein. All critical points on both the receptor and ligand are represented via context shapes, and the best docking is found via pair-wise matching.

Changed line 5 from:
to:
Changed line 30 from:

complex and informative patterns types such as: Itemsets, Sequences, Trees and Graphs.

to:

complex and informative patterns types such as: Itemsets, Sequences, Trees and Graphs.

Changed line 5 from:
to:
Changed line 43 from:
to:
Changed line 72 from:
to:
Deleted line 20:
Changed lines 22-24 from:
to:
Added lines 52-54:

(:toc-back:)

Changed lines 56-57 from:
to:


Added lines 132-133:

(:toc-back:)

Added lines 174-175:

(:toc-back:)

Added lines 189-190:

(:toc-back:)

Added lines 239-240:

(:toc-back:)

Changed lines 250-251 from:
Origami
to:
Representative Orthogonal Graph Mining (Origami)
Changed lines 264-265 from:
Graph Pattern Sampling
to:
Graph Pattern Sampling (MUSK)
Added lines 277-278:

(:toc-back:)

Changed lines 286-288 from:

This section contains code for mining categorical subspace clusters, shape based clusters, for clustering based on a lower bound on similarity, and a new outlier-based approach for initial cluster seed selection.

to:

This section contains code for mining categorical subspace clusters, shape based clusters, for clustering based on a lower bound on similarity, and a new outlier-based approach for initial cluster seed selection.

Changed lines 290-291 from:
CLICKS
to:
Categorical Subspace Clustering (CLICKS)
Changed lines 304-305 from:
Sparcl
to:
Arbitrary Shape Clustering (Sparcl)
Changed line 318 from:
Robin
to:
K-means Initialization (Robin)
Added lines 328-329:

(:toc-back:)

Changed lines 339-340 from:
TriCluster and MicroCluster
to:
Triclusters and Biclusters (TriCluster and MicroCluster)
Added lines 352-353:

(:toc-back:)

Added lines 374-375:

(:toc-back:)

Changed lines 384-385 from:
Trellis and Trellis+
to:
Disk-based Suffix Trees (Trellis and Trellis+)
Added lines 398-399:

(:toc-back:)

Added lines 422-423:

(:toc-back:)

Added lines 481-482:

(:toc-back:)

Changed line 494 from:
to:
Changed lines 501-503 from:
to:

(:toc-back:)

Changed lines 349-350 from:

VOGUE is a variable order and gapped HMM with with duration. It uses sequence mining to extract frequent patterns in the data. It then uses these patterns to build a variable order HMM with explicit duration on the gap states, for sequence modeling and classification.

to:

VOGUE is a variable order and gapped HMM with with duration. It uses sequence mining to extract frequent patterns in the data. It then uses these patterns to build a variable order HMM with explicit duration on the gap states, for sequence modeling and classification. VOGUE was applied to model protein sequences, as well as a number of other sequence datasets including weblogs.

Deleted lines 357-358:

Changed line 360 from:

Genome Scale Indexing

to:

Added lines 363-365:

Genome Scale Indexing


Deleted lines 379-380:

Changed line 382 from:

Protein Structure

to:

Changed lines 385-388 from:
Flexible and Non-sequential Protein Structure Alignment (SNAP/STSA and FlexSnap)

SNAP finds non-sequential 3D protein structure alignments. It was initially called STSA (2008-snap). FlexSnap allows the ability to find both flexible and non-sequential allignments.

to:

Structured Sequence Motifs: Search and Extraction


sMotif and exMotif

sMotif and exMotif are two complementary for searching and extracting/mining structured sequence motifs DNA sequences. A structured motif consists of simple motifs separated by different gap lengths. The simple motif may be a simple pattern or a position weighted matrix or profile. Given a template structured motif (pattern or profile), sMotif finds all matches in a given set of sequences. On the other hand, exMotif mines novel motifs matching some minimal conditions on the gaps and frequency.

Changed lines 394-396 from:
to:
Changed lines 399-400 from:
  • Saeed Salem, Mohammed J. Zaki and Chris Bystroff, Iterative Non-Sequential Protein Structural Alignment. Journal of Bioinformatics and Computational Biology, 7(3):571-596. Jun 2009. (PDF) (BibTeX)
  • Saeed Salem, Mohammed J. Zaki and Chris Bystroff, FlexSnap: Flexible Non-Sequential Protein Structure Alignment. In 9th Workshop on Algorithms in Bioinformatics. Sep 2009. (PDF) (BibTeX)
to:
  • Yongqiang Zhang and Mohammed J. Zaki, EXMOTIF: efficient structured motif extraction. Algorithms for molecular biology, 1(21). Nov 2006. ((URL)) (PDF) (BibTeX)
  • Yongqiang Zhang and Mohammed J. Zaki, SMOTIF: efficient structured pattern and profile motif search. Algorithms for molecular biology, 1(22). Nov 2006. ((URL)) (PDF) (BibTeX)
Changed lines 402-404 from:



Protein Docking (ContextShapes)
to:




Protein Structure


Flexible and Non-sequential Protein Structure Alignment (SNAP/STSA and FlexSnap)

SNAP finds non-sequential 3D protein structure alignments. It was initially called STSA (2008-snap). FlexSnap allows the ability to find both flexible and non-sequential allignments.

Changed lines 416-417 from:
to:
Changed lines 420-421 from:
  • Zujun Shentu, Mohammad Al Hasan, Chris Bystroff and Mohammad J. Zaki, Context Shapes: Efficient Complementary Shape Matching for Protein-Protein Docking. Proteins: Structure, Function and Bioinformatics, 70(3):1056-1073. Feb 2008. (PDF)[==] (BibTeX)
to:
  • Saeed Salem, Mohammed J. Zaki and Chris Bystroff, Iterative Non-Sequential Protein Structural Alignment. Journal of Bioinformatics and Computational Biology, 7(3):571-596. Jun 2009. (PDF) (BibTeX)
  • Saeed Salem, Mohammed J. Zaki and Chris Bystroff, FlexSnap: Flexible Non-Sequential Protein Structure Alignment. In 9th Workshop on Algorithms in Bioinformatics. Sep 2009. (PDF) (BibTeX)
Changed lines 425-428 from:
Protein Indexing (PSIST)

PSIST uses suffix trees to index protein 3D structure. It first converts the 3D structure into a structure-feature sequence over a new structural alphabet, which is then used to index protein structures. The PSIST index makes it very fast to query for a matching structural fragment.

to:
Protein Docking (ContextShapes)
Changed lines 428-429 from:
to:
Changed line 431 from:
  • Feng Gao and Mohammed J. Zaki, PSIST: A Scalable Approach to Indexing Protein Structures using Suffix Trees. Journal of Parallel and Distributed Computing, 68(1):55-63. Jan 2008. (PDF)[==] (BibTeX)
to:
  • Zujun Shentu, Mohammad Al Hasan, Chris Bystroff and Mohammad J. Zaki, Context Shapes: Efficient Complementary Shape Matching for Protein-Protein Docking. Proteins: Structure, Function and Bioinformatics, 70(3):1056-1073. Feb 2008. (PDF)[==] (BibTeX)
Changed lines 433-441 from:




Real and Synthetic Datasets

The section contains various synthetic and real datasets used in some of the papers related to itemset, sequence, tree and XML mining.

to:



Protein Indexing (PSIST)

PSIST uses suffix trees to index protein 3D structure. It first converts the 3D structure into a structure-feature sequence over a new structural alphabet, which is then used to index protein structures. The PSIST index makes it very fast to query for a matching structural fragment.

Changed lines 441-445 from:
  • [[Path:/~zaki/software/IBM-datagen.tgz | IBM Datagen program]: contains the IBM synthetic dataset generator for itemset patterns.
  • Tree Generator: contains the synthetic tree generator described in (2005-treeminer:tkde).
  • Real Datasets: contains various real itemset datasets like chess, connect, mushroom, pumsb and so on, used in the papers on frequent, closed and maximal itemset mining.
  • CSLOGS data: The CSLOGS data was used for (2005-treeminer:tkde).
  • Xrules Log Data: The log data used for XML classification in (2006-xrules:mlj).
to:
  • Relevant Publications
    • Feng Gao and Mohammed J. Zaki, PSIST: A Scalable Approach to Indexing Protein Structures using Suffix Trees. Journal of Parallel and Distributed Computing, 68(1):55-63. Jan 2008. (PDF) (BibTeX)



Protein Folding Pathways (UNFOLD)

UNFOLD uses a recursive min-cut on a weighted secondary structure element graph to predict the sequence of protein (un)folding events.

  • Download
  • Relevant Publications
    • Mohammed J. Zaki, Vinay Nadimpally, Deb Bardhan and Chris Bystroff, Predicting protein folding pathways. Bioinformatics, 20(1):i386-i393. Aug 2004. (PDF) (BibTeX)




Real and Synthetic Datasets

The section contains various synthetic and real datasets used in some of the papers related to itemset, sequence, tree and XML mining.

  • Download
    • [[Path:/~zaki/software/IBM-datagen.tar.gz | IBM Datagen program]: contains the IBM synthetic dataset generator for itemset patterns
    • Tree Generator: contains the synthetic tree generator described in (2005-treeminer:tkde)
    • Real Datasets: contains various real itemset datasets like chess, connect, mushroom, pumsb and so on, used in the papers on frequent, closed and maximal itemset mining
    • CSLOGS data: The CSLOGS data was used for (2005-treeminer:tkde)
    • Xrules Log Data: The log data used for XML classification in (2006-xrules:mlj)
    • Xrules synthetic datasets: The synthetic classification data used for XML classification in (2006-xrules:mlj)
    • Plan dataset: Planning dataset for sequence mining
Changed lines 117-119 from:

Itemset and Sequence Utilities

to:

Frequent Boolean Expressions (BLOSOM)

The BLOSOM framework allows one to mine arbitrary frequent boolean expressions include AND clauses (itemsets), OR clauses, and CNF/DNF expressions. It focuses on mining the minimal boolean expressions.

Changed lines 124-128 from:
to:
  • BLOSOM code: mug find minimal or-clauses, xng finds minimal CNF expressions and xug find minimal DNF expressions.
  • Charm-L: this code can mine all minimal and closed and-clauses, i.e., all minimal and closed frequent itemsets.
  • Relevant Publications
    • Lizhuang Zhao, Mohammed J. Zaki and Naren Ramakrishnan, BLOSOM: A Framework for Mining Arbitrary Boolean Expressions. In 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Aug 2006. (PDF) (BibTeX)
Deleted lines 129-143:



Frequent Boolean Expressions (BLOSOM)

The BLOSOM framework allows one to mine arbitrary frequent boolean expressions include AND clauses (itemsets), OR clauses, and CNF/DNF expressions. It focuses on mining the minimal boolean expressions.

  • Download
    • BLOSOM code: mug find minimal or-clauses, xng finds minimal CNF expressions and xug find minimal DNF expressions.
    • Charm-L: this code can mine all minimal and closed and-clauses, i.e., all minimal and closed frequent itemsets.
  • Relevant Publications
    • Lizhuang Zhao, Mohammed J. Zaki and Naren Ramakrishnan, BLOSOM: A Framework for Mining Arbitrary Boolean Expressions. In 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Aug 2006. (PDF) (BibTeX)
Changed lines 175-182 from:

Tree Mining

to:

Itemset and Sequence Utilities

Provides the utilities needed for Eclat, Charm/Charm-L, GenMax, Spade and cSpade.

Added lines 185-190:


Tree Mining


Changed lines 384-385 from:
to:


Deleted lines 421-422:

Added line 423:
Changed lines 427-428 from:

Datasets

to:

Real and Synthetic Datasets

The section contains various synthetic and real datasets used in some of the papers related to itemset, sequence, tree and XML mining.

Changed lines 433-436 from:
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
to:
  • [[Path:/~zaki/software/IBM-datagen.tgz | IBM Datagen program]: contains the IBM synthetic dataset generator for itemset patterns.
  • Tree Generator: contains the synthetic tree generator described in (2005-treeminer:tkde).
  • Real Datasets: contains various real itemset datasets like chess, connect, mushroom, pumsb and so on, used in the papers on frequent, closed and maximal itemset mining.
  • CSLOGS data: The CSLOGS data was used for (2005-treeminer:tkde).
  • Xrules Log Data: The log data used for XML classification in (2006-xrules:mlj).
Changed lines 117-121 from:

Frequent Boolean Expressions (BLOSOM)

The BLOSOM framework allows one to mine arbitrary frequent boolean expressions include AND clauses (itemsets), OR clauses, and CNF/DNF expressions. It focuses on mining the minimal boolean expressions.

to:

Itemset and Sequence Utilities

Changed lines 122-126 from:
  • BLOSOM code: mug find minimal or-clauses, xng finds minimal CNF expressions and xug find minimal DNF expressions.
  • Charm-L: this code can mine all minimal and closed and-clauses, i.e., all minimal and closed frequent itemsets.
  • Relevant Publications
    • Lizhuang Zhao, Mohammed J. Zaki and Naren Ramakrishnan, BLOSOM: A Framework for Mining Arbitrary Boolean Expressions. In 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Aug 2006. (PDF) (BibTeX)
to:
Added lines 124-138:



Frequent Boolean Expressions (BLOSOM)

The BLOSOM framework allows one to mine arbitrary frequent boolean expressions include AND clauses (itemsets), OR clauses, and CNF/DNF expressions. It focuses on mining the minimal boolean expressions.

  • Download
    • BLOSOM code: mug find minimal or-clauses, xng finds minimal CNF expressions and xug find minimal DNF expressions.
    • Charm-L: this code can mine all minimal and closed and-clauses, i.e., all minimal and closed frequent itemsets.
  • Relevant Publications
    • Lizhuang Zhao, Mohammed J. Zaki and Naren Ramakrishnan, BLOSOM: A Framework for Mining Arbitrary Boolean Expressions. In 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Aug 2006. (PDF) (BibTeX)
Deleted lines 226-229:

Changed line 229 from:

Graph Mining

to:

Added lines 232-234:

Graph Mining


Changed lines 236-237 from:

Origami

to:
Origami
Changed lines 250-251 from:

Graph Pattern Sampling

to:
Graph Pattern Sampling
Changed lines 271-273 from:
to:


Added line 287:

Added line 301:

Changed lines 318-319 from:
TriCluster
to:


TriCluster and MicroCluster

Tricluster is the first tri-clustering algorithm for microarray expression clustering. It builds upon the new microCluster bi-clustering approach. Tricluster first mines all the bi-clusters across the gene-sample slices, and then it extends these into tri-clusters across time or space (depending on the third dimension). It can find both scaling and shifting patterns.

Changed lines 327-328 from:
to:
Changed lines 331-332 from:
  • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
to:
  • Lizhuang Zhao and Mohammed J. Zaki, TriCluster: An Effective Algorithm for Mining Coherent Clusters in 3D Microarray Data. In ACM SIGMOD Conference on Management of Data. Jun 2005. (PDF) (BibTeX)
  • Lizhuang Zhao and Mohammed J. Zaki, MicroCluster: An Efficient Deterministic Biclustering Algorithm for Microarray Data. IEEE Intelligent Systems, 20(6):40-49. Nov/Dec 2005. (PDF) (BibTeX)
Changed lines 334-344 from:



MicroCluster
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
to:


Deleted lines 336-348:

Hidden Markov Models

VOGUE
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)

Changed lines 339-341 from:

Genome Scale Indexing

Trellis
to:

Hidden Markov Models (HMM)


Variable Order HMM with Duration (VOGUE)

VOGUE is a variable order and gapped HMM with with duration. It uses sequence mining to extract frequent patterns in the data. It then uses these patterns to build a variable order HMM with explicit duration on the gap states, for sequence modeling and classification.

Changed lines 349-350 from:
to:
Changed line 352 from:
  • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
to:
  • Invalid BibTex Entry!
Changed lines 358-360 from:

Protein Structure Alignment

Snap
to:

Genome Scale Indexing


Trellis and Trellis+

Trellis is a disk-based suffix tree indexing methods (with suffix links) that is capable of indexing the entire human genome on a commodity PC with limited memory. Trellis+ extends Trellis by further removing some memory limitations by using a novel guide suffix tree in memory.

Changed lines 367-368 from:
to:
Changed lines 372-373 from:
  • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
to:
  • Benjarath Phoophakdee and Mohammed J. Zaki, Genome-scale Disk-based Suffix Tree Indexing. In ACM SIGMOD International Conference on Management of Data. Jun 2007. (PDF) (BibTeX)
  • Benjarath Phoophakdee and Mohammed J. Zaki, TRELLIS+: An Effective Approach for Indexing Genome-Scale Sequences using Suffix Trees. In 13th Pacific Symposium on Biocomputing. Jan 2008. (PDF) (BibTeX)
Changed lines 375-385 from:



FlexSnap
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
to:
Changed lines 379-381 from:

Protein Docking

ContextShapes
to:

Protein Structure

Flexible and Non-sequential Protein Structure Alignment (SNAP/STSA and FlexSnap)

SNAP finds non-sequential 3D protein structure alignments. It was initially called STSA (2008-snap). FlexSnap allows the ability to find both flexible and non-sequential allignments.

Changed lines 387-388 from:
to:
Changed lines 391-392 from:
  • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
to:
  • Saeed Salem, Mohammed J. Zaki and Chris Bystroff, Iterative Non-Sequential Protein Structural Alignment. Journal of Bioinformatics and Computational Biology, 7(3):571-596. Jun 2009. (PDF) (BibTeX)
  • Saeed Salem, Mohammed J. Zaki and Chris Bystroff, FlexSnap: Flexible Non-Sequential Protein Structure Alignment. In 9th Workshop on Algorithms in Bioinformatics. Sep 2009. (PDF) (BibTeX)
Changed lines 394-400 from:


Protein Indexing

PSIST
to:



Protein Docking (ContextShapes)
Changed lines 399-400 from:
to:
Changed line 402 from:
  • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
to:
  • Zujun Shentu, Mohammad Al Hasan, Chris Bystroff and Mohammad J. Zaki, Context Shapes: Efficient Complementary Shape Matching for Protein-Protein Docking. Proteins: Structure, Function and Bioinformatics, 70(3):1056-1073. Feb 2008. (PDF)[==] (BibTeX)
Changed lines 404-409 from:


Utilities

to:



Protein Indexing (PSIST)

PSIST uses suffix trees to index protein 3D structure. It first converts the 3D structure into a structure-feature sequence over a new structural alphabet, which is then used to index protein structures. The PSIST index makes it very fast to query for a matching structural fragment.

Changed lines 412-413 from:
to:
Changed line 415 from:
  • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
to:
  • Feng Gao and Mohammed J. Zaki, PSIST: A Scalable Approach to Indexing Protein Structures using Suffix Trees. Journal of Parallel and Distributed Computing, 68(1):55-63. Jan 2008. (PDF)[==] (BibTeX)
Added lines 418-419:

\\

Changed lines 63-64 from:
Eclat
to:
Frequent Itemsets (Eclat)
Changed lines 82-83 from:
Charm, Charm-L and Non-redundant Rule Generation
to:
Closed Itemsets (Charm, Charm-L) and Non-redundant Rule Generation
Changed lines 103-104 from:
GenMax
to:
Maximal Itemsets (GenMax)
Added lines 115-129:



Frequent Boolean Expressions (BLOSOM)

The BLOSOM framework allows one to mine arbitrary frequent boolean expressions include AND clauses (itemsets), OR clauses, and CNF/DNF expressions. It focuses on mining the minimal boolean expressions.

  • Download
    • BLOSOM code: mug find minimal or-clauses, xng finds minimal CNF expressions and xug find minimal DNF expressions.
    • Charm-L: this code can mine all minimal and closed and-clauses, i.e., all minimal and closed frequent itemsets.
  • Relevant Publications
    • Lizhuang Zhao, Mohammed J. Zaki and Naren Ramakrishnan, BLOSOM: A Framework for Mining Arbitrary Boolean Expressions. In 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Aug 2006. (PDF) (BibTeX)
Changed lines 139-140 from:
Spade
to:
Frequent Sequences (Spade)
Changed lines 153-154 from:
cSpade
to:
Sequence Constraints (cSpade)
Changed lines 194-195 from:
SLEUTH
to:
SLEUTH
Added lines 205-221:



XML Classification (XRules)

XRules uses frequent tree mining to mine discriminative patterns from tree-structured XML documents.

  • Download
  • Relevant Publications
    • Mohammed J. Zaki and Charu C. Aggarwal, Xrules: An effective structural classifier for XML data. Machine Learning Journal, 62(1-2):137-170. Feb 2006. (PDF) (BibTeX)

Changed line 224 from:

---

to:

Graph Mining

Deleted lines 226-227:

Graph Mining

Changed lines 248-250 from:
to:
Changed lines 260-262 from:

Boolean Expression Mining

BLOSOM
to:

Clustering

This section contains code for mining categorical subspace clusters, shape based clusters, for clustering based on a lower bound on similarity, and a new outlier-based approach for initial cluster seed selection.

CLICKS

CLICKS finds subspace clusters in categorical data using a k-partite clique mining approach.

Changed lines 270-271 from:
to:
Changed line 273 from:
  • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
to:
  • Mohammed J. Zaki, Markus Peters, Ira Assent and Thomas Seidl, CLICKS: An Effective Algorithm for Mining Subspace Clusters in Categorical Datasets. Data and Knowledge Engineering, 60(1):51-70. Jan 2007. (PDF)[==] (BibTeX)
Changed lines 275-281 from:


Clustering

CLICKS
to:



Sparcl

Sparcl finds shape-based clusters. It uses a two step approach: in the first step we select a relatively large number of candidate centroids (via ROBIN) to find seed clusters via the K-means algorithm and in the second step we use a novel similarity kernel to merge the initial seed clusters to yield the final arbitrary shaped clusters.

Changed lines 283-284 from:
to:
Changed line 286 from:
  • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
to:
  • Vineet Chaoji, Mohammad Al Hasan, Saeed Salem and Mohammed J. Zaki, SPARCL: An Effective and Efficient Algorithm for Mining Arbitrary Shape-based Clusters. Knowledge and Information Systems, 21(2):201-229. Nov 2009. (PDF)[==] (BibTeX)
Changed lines 290-292 from:
Sparcl
to:
Robin

ROBIN uses a new local outllier factor based initial seed selection to improve k-means style clustering.

Changed lines 295-296 from:
to:
Changed line 298 from:
  • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
to:
  • Mohammad Al Hasan, Vineet Chaoji, Saeed Salem and Mohammed J. Zaki, Robust Partitional Clustering by Outlier and Density Insensitive Seeding. Pattern Recognition Letters, 30(11):994-1002. Aug 2009. (PDF)[==] (BibTeX)
Deleted lines 299-310:



Robin
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)

Deleted lines 300-310:

XML Classification

Xrules
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
Changed lines 181-182 from:

SLEUTH extends the TreeMiner methodology to mine all frequent embedded or induced as well as ordered or unordered tree patterns.

to:

SLEUTH extends the TreeMiner methodology to mine all frequent embedded or induced as well as ordered or unordered tree patterns.

Deleted lines 189-190:

---

Added lines 192-194:

---

Added line 197:

Added lines 200-201:

Origami uses random walks over the graph partial order to mine a representative sample of the maximal frequent subgraph patterns. It next selects only the orthogonal representative patterns via a local clique-finding method.

Changed lines 204-205 from:
to:
Changed line 207 from:
  • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
to:
  • Vineet Chaoji, Mohammad Al Hasan, Saeed Salem , Jeremy Besson and Mohammed J. Zaki, ORIGAMI: A Novel and Effective Approach for Mining Representative Orthogonal Graph Patterns. Statistical Analysis and Data Mining, 1(2):67-84. Jun 2008. (PDF)[==] (BibTeX)
Changed lines 211-215 from:

Musk

to:

Graph Pattern Sampling

Whereas Origami can mine a sample of maximal graph patterns it does not provide any uniformity guarantee. MUSK (2009-musk) proposes a Markov Chain Monte Carlo based approach to guarantee a uniform sample of all maximal patterns. The MCMC approach was further extended in (2009-graphsampling) to mine a sample of all frequent patterns, to mine support-biased patterns and also to mine a sample of discriminative patterns.

Changed lines 218-219 from:
to:
Changed lines 222-223 from:
  • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
to:
  • Mohammad Al Hasan and Mohammed J. Zaki, MUSK: Uniform Sampling of k Maximal Patterns. In 9th SIAM International Conference on Data Mining. Apr 2009. (PDF) (BibTeX)
  • Mohammad Al Hasan and Mohammed J. Zaki, Output Space Sampling for Graph Patterns. Proceedings of the VLDB Endowment (35th International Conference on Very Large Data Bases), 2(1):730-741. 2009. (PDF) (BibTeX)
Changed lines 225-234 from:



OSS

  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
to:

\\

Changed lines 166-167 from:

Treeminer uses the vertical scope-list representation to mine frequent sequences {[2005-treeminer:tkde]}. Treeminer counts all embeddings, whereas Treeminer-D counts only distinct occurrences, which can be more appropriate for some datasets.

to:

Treeminer uses the vertical scope-list representation to mine frequent sequences (2005-treeminer:tkde). Treeminer counts all embeddings, whereas Treeminer-D counts only distinct occurrences, which can be more appropriate for some datasets.

Changed line 170 from:
to:
  • TreeMiner code: vtreeminer is the TreeMiner code, whereas htreeminer is the PatternMatcher code as mentioned in the paper below.
Added lines 181-182:

SLEUTH extends the TreeMiner methodology to mine all frequent embedded or induced as well as ordered or unordered tree patterns.

Changed lines 185-186 from:
to:
Changed line 188 from:
  • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
to:
  • Mohammed J. Zaki, Efficiently Mining Frequent Embedded Unordered Trees. Fundamenta Informaticae, 66(1-2):33-52. Mar/Apr 2005. (PDF)[==] (BibTeX)
Changed lines 121-122 from:
to:


Changed lines 150-151 from:
to:
  • You also need to download utilities like getconf, exttpose from Utilities section.
Deleted lines 154-155:

Added lines 157-159:


Changed lines 161-162 from:
TreeMiner
to:


TreeMiner and TreeMiner-D

Treeminer uses the vertical scope-list representation to mine frequent sequences {[2005-treeminer:tkde]}. Treeminer counts all embeddings, whereas Treeminer-D counts only distinct occurrences, which can be more appropriate for some datasets.

Changed lines 170-171 from:
to:
Changed lines 174-184 from:
  • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)



TreeMiner-D
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
to:
  • Mohammed J. Zaki, Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications. IEEE Transactions on Knowledge and Data Engineering, 17(8):1021-1035. Aug 2005. (PDF)[==] (BibTeX)
Changed lines 92-93 from:
to:
Changed line 109 from:
to:
Deleted lines 114-115:

Added lines 117-119:


Added lines 125-126:

Spade uses the vertical format for mining the set of all frequent sequences from a dataset of many sequences. It also mines the frequent sequences of itemsets.

Changed lines 129-130 from:
to:
Changed line 133 from:
  • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
to:
  • Mohammed J. Zaki, SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal, 42(1/2):31-60. Jan/Feb 2001. (PDF)[==] (BibTeX)]
Added lines 139-145:

cSpade mines constrained frequent sequences. The constraints can take the form of length or width limitations on the sequences, minimum or maximum gap constraints on consecutive sequence elements, applying a time window on allowable sequences, incorporating item constraints, and finding sequences predictive of one or more classes. The class specific sequences can be used for sequence classification as described in {[2000-featuremine:is]}.

Changed lines 148-149 from:
to:
Changed line 151 from:
  • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF)[==] (BibTeX)
to:
  • Mohammed J. Zaki, Sequences Mining in Categorical Domains: Incorporating Constraints. In 9th ACM International Conference on Information and Knowledge Management. Nov 2000. (PDF)[==] (BibTeX)
Changed line 70 from:
to:
Changed lines 73-74 from:
  • You also need to download utilities like getconf, exttpose from Utilities section.
to:
  • You also need to download utilities like getconf, exttpose from Utilities section.
Changed lines 94-95 from:
  • You also need to download utilities like getconf, exttpose from Utilities section.
to:
  • You also need to download utilities like getconf, exttpose from Utilities section.
Changed line 110 from:
  • You also need to download utilities like getconf, exttpose from Utilities section.
to:
  • You also need to download utilities like getconf, exttpose from Utilities section.
Changed lines 72-73 from:
to:
Changed lines 79-81 from:
to:



Changed lines 85-86 from:

Charm-L adds the ability to construct the entire frequent concept lattice, that is, it adds the links

to:

Charm-L adds the ability to construct the entire frequent concept lattice (also called as the iceberg lattice), that is, it adds the links

Changed lines 94-95 from:
to:
  • You also need to download utilities like getconf, exttpose from Utilities section.
Changed lines 100-101 from:
to:



Added lines 105-118:

Genmax mines all maximal frequent itemsets via a backtracking approach with progressive focusing.

  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)


Changed lines 124-132 from:
to:
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)



Added lines 135-145:
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)


Changed lines 149-157 from:
to:
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)



Changed lines 159-160 from:
to:
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)



Added lines 171-181:
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)

---

Added line 183:
Added lines 185-194:
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)



Added lines 196-204:
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)



Changed lines 206-216 from:
to:
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)


Changed lines 218-219 from:

BLOSOM

to:
BLOSOM
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)


Changed lines 232-235 from:

CLICKS

Sparcl

Robin

to:
CLICKS
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)



Sparcl
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)



Robin
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)


Changed lines 266-267 from:

Xrules

to:
Xrules
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)


Changed lines 280-282 from:

TriCluster

MicroCluster

to:
TriCluster
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)



MicroCluster
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)

Changed lines 303-304 from:

VOGUE

to:
VOGUE
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)


Changed lines 317-318 from:

Trellis

to:
Trellis
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)


Changed lines 331-333 from:

Snap

FlexSnap

to:
Snap
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)



FlexSnap
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)


Changed lines 355-356 from:

ContextShapes

to:
ContextShapes
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)


Changed lines 369-370 from:

PSIST

to:
PSIST
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)


Changed lines 384-403 from:

Datasets

to:
  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)


Datasets

  • Download
  • Relevant Publications
    • Karam Gouda and Mohammed J. Zaki, GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223-242. Nov 2005. (PDF) (BibTeX)
Changed line 21 from:
to:
Added lines 24-26:

Changed lines 40-41 from:

DMTL implements all algorithms following the vertical data mining approach. For itemsets, the implementation follows the Eclat approach (without diffsets). For sequences it follows the Spade approach. For trees it implements the SLEUTH framework, which allows one to mine embedded/induced and ordered/unordered trees. Finally, the graph mining framework uses a novel vertical approach.

to:

DMTL implements all algorithms following the vertical data mining approach. For itemsets, the implementation follows the Eclat approach (without diffsets). For sequences it follows the Spade approach. For trees it implements the SLEUTH framework, which allows one to mine embedded/induced and ordered/unordered trees. Finally, the graph mining framework uses a novel vertical approach.

Changed lines 52-53 from:
to:
Changed lines 56-58 from:

Itemset Mining

The section contains the code for mining all frequent itemsets, all closed itemsets, and all maximal frequent itemsets.

to:

Itemset Mining and Association Rules

The section contains the code for mining all frequent itemsets, all closed itemsets, and all maximal frequent itemsets. It also includes the code for constructing the concept lattice (or iceberg lattice) and generating nonredundant association rules.

Changed lines 66-67 from:

(1997-eclat), combined with the diffsets improvement (2003-diffsets). The code also contains the maxeclat, clique, and maxclique approaches mentioned in

to:

(1997-eclat), combined with the diffsets improvement (2003-diffsets). The code also contains the maxeclat, clique, and maxclique approaches mentioned in

Added line 70:
Changed lines 72-73 from:
to:
Changed lines 77-78 from:
to:
Changed lines 81-82 from:

Charm mines all the frequent closed itemsets as described in {[]}.

to:

Charm mines all the frequent closed itemsets as described in (2002-charm). Charm-L adds the ability to construct the entire frequent concept lattice, that is, it adds the links between all sub/super-concepts (or closed itemsets) (2005-charm:tkde). This ability is used to mine the non-redundant association rules (2004-nonredundant:dmkd).

  • Download
  • Relevant Publications
    • Mohammed J. Zaki and Ching-Jui Hsiao, Efficient Algorithms for Mining Closed Itemsets and their Lattice Structure. IEEE Transactions on Knowledge and Data Engineering, 17(4):462-478. Apr 2005. (PDF) (BibTeX)
    • Mohammed J. Zaki, Mining Non-Redundant Association Rules. Data Mining and Knowledge Discovery: An International Journal, 9(3):223-248. Nov 2004. (PDF) (BibTeX)
Added line 63:
  • Mohammed J. Zaki and Karam Gouda, Fast Vertical Mining Using Diffsets. In 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Aug 2003. (PDF)[==] (BibTeX)
Changed lines 65-69 from:
  • Mohammed J. Zaki and Karam Gouda, Fast Vertical Mining Using Diffsets. In 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Aug 2003. (PDF) (BibTeX)
Charm
Charm-L
to:
Charm, Charm-L and Non-redundant Rule Generation

Charm mines all the frequent closed itemsets as described in {[]}.

Changed line 63 from:
  • Invalid BibTex Entry!
to:
  • Mohammed J. Zaki, Scalable Algorithms for Association Mining. IEEE Transactions on Knowledge and Data Engineering, 12(3):372-390. May/Jun 2000. (PDF)[==] (BibTeX)
Changed lines 63-65 from:
  • Vineet Chaoji, Mohammad Al Hasan, Saeed Salem and Mohammed J. Zaki, An integrated, generic approach to pattern mining: data mining template library. Data Mining and Knowledge Discovery, 17(3):457-495. Dec 2008. (PDF)[==] (BibTeX)
to:
  • Invalid BibTex Entry!
  • Mohammed J. Zaki and Karam Gouda, Fast Vertical Mining Using Diffsets. In 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Aug 2003. (PDF) (BibTeX)
Added line 52:

Changed lines 55-56 from:

Eclat uses the original vertical tidset approach for mining all frequent itemsets Invalid BibTex Entry!, combined with the diffsets improvement Invalid BibTex Entry!. The code also contains the maxeclat, clique, and maxclique approaches mentioned in Invalid BibTex Entry!.

to:

Eclat uses the original vertical tidset approach for mining all frequent itemsets (1997-eclat), combined with the diffsets improvement (2003-diffsets). The code also contains the maxeclat, clique, and maxclique approaches mentioned in (2000-eclat:tkde).

Changed lines 70-72 from:

Spade

cSpade

to:

Spade
cSpade
Changed lines 78-80 from:

TreeMiner

TreeMiner-D

SLEUTH

to:
TreeMiner
TreeMiner-D

SLEUTH
Added lines 37-38:

DMTL implements all algorithms following the vertical data mining approach. For itemsets, the implementation follows the Eclat approach (without diffsets). For sequences it follows the Spade approach. For trees it implements the SLEUTH framework, which allows one to mine embedded/induced and ordered/unordered trees. Finally, the graph mining framework uses a novel vertical approach.

Changed lines 48-52 from:

Itemset Mining

Eclat

Charm

Charm-L

GenMax

to:

Itemset Mining

The section contains the code for mining all frequent itemsets, all closed itemsets, and all maximal frequent itemsets.

Eclat

Eclat uses the original vertical tidset approach for mining all frequent itemsets Invalid BibTex Entry!, combined with the diffsets improvement Invalid BibTex Entry!. The code also contains the maxeclat, clique, and maxclique approaches mentioned in Invalid BibTex Entry!.

  • Download
  • Relevant Publications
    • Vineet Chaoji, Mohammad Al Hasan, Saeed Salem and Mohammed J. Zaki, An integrated, generic approach to pattern mining: data mining template library. Data Mining and Knowledge Discovery, 17(3):457-495. Dec 2008. (PDF) (BibTeX)
Charm
Charm-L
GenMax
Changed lines 5-7 from:

Acknowledgments: We gratefully acknowledge the funding from the following agencies/programs that made the research possible:

to:

Acknowledgments

We gratefully acknowledge the funding from the following agencies/programs that made the research possible:

Deleted line 15:
Added lines 18-20:


Changed line 22 from:

(:toc-page Software/Software self=1:)

to:

(:*toc:)

Deleted line 23:
Changed lines 25-26 from:
to:


Changed lines 29-30 from:

complex and informative patterns types such as: Itemsets, Sequences, Trees and Graphs.

to:

complex and informative patterns types such as: Itemsets, Sequences, Trees and Graphs.

Changed lines 37-43 from:
Download:
  • Sourceforge Project Page
  • Sourceforge Download Page
Relevant Publications
  • Vineet Chaoji, Mohammad Al Hasan, Saeed Salem and Mohammed J. Zaki, An integrated, generic approach to pattern mining: data mining template library. Data Mining and Knowledge Discovery, 17(3):457-495. Dec 2008. (PDF) (BibTeX)
to:
  • Download
    • Sourceforge Project Page
    • Sourceforge Download Page
  • Relevant Publications
    • Vineet Chaoji, Mohammad Al Hasan, Saeed Salem and Mohammed J. Zaki, An integrated, generic approach to pattern mining: data mining template library. Data Mining and Knowledge Discovery, 17(3):457-495. Dec 2008. (PDF) (BibTeX)
Changed line 1 from:
to:
Changed lines 5-12 from:

(:*toc:)


Acknowledgments

We gratefully acknowledge the funding from the following agencies that made the research possible: National Science Foundation -- Information and Data Management program (IIS-0092978), and Next Generation Software program (EIA-0103708); and Department of Energy, Office of Science (DE-FG02-02ER25538)


to:

Acknowledgments: We gratefully acknowledge the funding from the following agencies/programs that made the research possible:

  • NSF Information and Data Management program (Grant: IIS-0092978)
  • NSF Next Generation Software program (Grant: EIA-0103708)
  • DOE Office of Science (Grant: DE-FG02-02ER25538)
  • CIA/NSA/DNI/IARPA Knowledge Discovery and Dissemination Program (Grants: EIA-0225715, ACI-0342411, CNS-0332960, CNS-0422637, CNS-0540232, IIS-0830218)
  • NSF Emerging Models and Technologies for Computation program (Grant: EMT-0829835)
  • NIH Biomedical Imaging and Bioengineering Institute (Grant: 1R01EB0080161-02)

(:toc-page Software/Software self=1:)

Changed lines 1-3 from:

(:*toc-float Table of Contents:) Disclaimer: The software is provided on an as is basis for research purposes. There is no additional support offered, nor are the author(s) or their institutions liable under any circumstances.

to:

Disclaimer: The software on this page is provided on an as is basis for research purposes. There is no additional support offered, nor are the author(s) or their institutions liable under any circumstances.

(:*toc:)

Changed lines 8-12 from:

Generic Pattern Mining

Data Mining Template Library

ItemsetMining

to:

Acknowledgments

We gratefully acknowledge the funding from the following agencies that made the research possible: National Science Foundation -- Information and Data Management program (IIS-0092978), and Next Generation Software program (EIA-0103708); and Department of Energy, Office of Science (DE-FG02-02ER25538)


Data Mining Template Library (DMTL)

DMTL is an open-source, high-performance, generic data mining toolkit, written in C++. It provides a collection of generic algorithms and data structures for mining increasingly complex and informative patterns types such as: Itemsets, Sequences, Trees and Graphs.

DMTL utilizes a generic data mining approach, where all aspects of mining are controlled via a set of properties. The kind of pattern to be mined, the kind of mining approach to use, and the kind of data types and formats to mine over are all specified as a list of properties. This provides tremendous flexibility to customize the toolkit for various applications.

Download:
  • Sourceforge Project Page
  • Sourceforge Download Page
Relevant Publications
  • Vineet Chaoji, Mohammad Al Hasan, Saeed Salem and Mohammed J. Zaki, An integrated, generic approach to pattern mining: data mining template library. Data Mining and Knowledge Discovery, 17(3):457-495. Dec 2008. (PDF) (BibTeX)

Itemset Mining

Changed line 1 from:

(:*toc-float:)

to:

(:*toc-float Table of Contents:)

Added line 12:

Charm-L

Added line 21:

TreeMiner-D

Changed lines 29-33 from:

Clusering

to:

Boolean Expression Mining

BLOSOM

Clustering

CLICKS

Added lines 37-43:

XML Classification

Xrules

Microarray Gene Expression Clustering

TriCluster

MicroCluster

Changed lines 58-62 from:

PSIST

to:

PSIST

Utilities

Datasets

Added line 1:

(:*toc-float:)

Deleted lines 3-5:

List of Software

(:*toc:)

Added lines 1-47:

Disclaimer: The software is provided on an as is basis for research purposes. There is no additional support offered, nor are the author(s) or their institutions liable under any circumstances.


List of Software

(:*toc:)


Generic Pattern Mining

Data Mining Template Library

ItemsetMining

Eclat

Charm

GenMax

Sequence Mining

Spade

cSpade

Tree Mining

TreeMiner

SLEUTH

Graph Mining

Origami

Musk

OSS

Clusering

Sparcl

Robin

Hidden Markov Models

VOGUE

Genome Scale Indexing

Trellis

Protein Structure Alignment

Snap

FlexSnap

Protein Docking

ContextShapes

Protein Indexing

PSIST