Open Source Data Mining Workshop
on Frequent Pattern Mining Implementations

in conjunction with ACM SIGKDD 2005

Prodeedings of the OSDM Workshop on ACM Digital Library


Over the past decade tremendous progress has been made in data mining methods like clustering, classification, frequent pattern mining, etc. However, unfortunately, the advanced implementations are often not made publicly available, and thus the results cannot be independently verified. This hampers the rapid advances in the field. There is thus a critical need to have open source implementations of important data mining methods. This workshop is the first such meeting place to discuss open source data mining methods. Since the scope of such a workshop can be rather broad, we focus our attention in the first year to Frequent Pattern Mining (FPM) problems. In subsequent years we will focus on open source implementations for other data mining problems like clustering, classification, outlier detection, etc.

Frequent pattern mining is a core field of research in data mining encompassing the discovery of patterns such as itemsets, sequences, trees, graphs, and many other structures. Varied approaches to these problems appear in numerous papers across all data mining conferences. Generally speaking, the problem involves the identification of items, products, symptoms, characteristics, and so forth, that often occur together in a given dataset. As a fundamental operation in data mining, algorithms for FPM can be used as a building block for other, more sophisticated data mining processes. During the last decade, a huge number of algorithms have been developed in order to efficiently solve the FPM problems.

Submissions consist of source code in addition to a paper that describes the implemented FPM algorithm and provides a performance study on publicly provided datasets. We request that the paper also provides a deep analysis of the proposed techniques, by presenting results on the performance of the algorithm with and without each of the used techniques or optimizations, and if appropriate, an explanation of why the submitted algorithm performs better than existing implementations or algorithms.

The workshop participants will be invited to come and discuss the submission; there will be a heavy focus on critical evaluation, i.e., what are the limitations, under what conditions does the algorithm work well, why it fails in other cases, and what are the open areas. One outcome of the workshop will be to outline the focus for research on new problems in the field. Although there will be no performance contest, we believe that the open source nature of the workshop encourages authors to accurately and honestly compare their algorithms with others, and vice versa.
Webmaster: Bart Goethals