WORKSHOP CO-CHAIRS:

Mohammed J. Zaki

Rensselaer Polytechnic Institute
(zaki.AT.cs.rpi.edu )

Vipin Kumar

University of Minnesota
(kumar@cs.umn.edu)

David Skillicorn

Queens University, Canada
(skill@cs.queensu.ca)

PROGRAM COMMITTEE:

Philip Chan, Florida Institute of Technology

David Cheung, University of Hong Kong, Hong Kong

Alok Choudhary, Northwestern University

Alex A. Freitas, PUC-PR (Pontifical Catholic University of Parana), Brazil

Johannes Gehrke, Cornell University

Robert Grossman, University of Illinois-Chicago

Yike Guo, Imperial College, UK

Howard Ho, IBM Almaden Research Center

Chandrika Kamath, Lawrence Livermore National Lab

Hillol Kargupta, Washington State University

Masaru Kitsuregawa, University of Tokyo, Japan

Bill Maniatty, State University of New York-Albany

Ron Musick, Lawrence Livermore National Lab

Yi Pan, Georgia State University

Srini Parthasarathy, Ohio State University

Foster Provost, New York University

Arno Siebes, CWI, Netherlands

Domenico Talia, ISI-CNR, Rende, Italy

Albert Zomaya, University of Western Australia

PDDM, 2001

4th International
Workshop on Parallel and Distributed Data Mining
April 27, 2001
San Francisco, CA, USA

in conjunction with

15th International Parallel and Distributed Processing Symposium(IPDPS'2001)

Workshop History: This is the 4th workshop on this theme held annually in conjunction with the IPDPS conference. The first three workshops went under the name "High Performance Data Mining," and were held at Orlando ( HPDM'98), San Juan ( HPDM'99) and Cancun (HPDM'00). In keeping with the growing popularity and international scope of this field, this workshop has been renamed "International Workshop on Parallel and Distributed Data Mining".

As the volume of data increases, it is clear that both parallel and distributed data mining techniques are required to make the whole knowledge discovery process scalable and interactive. This workshop will target papers on high performance parallel and distributed methods, as well as mining on distributed and heterogeneous datasets. Topics of interest include:

Efficient, scalable, disk-based, parallel and distributed algorithms for large-scale data mining tasks.
New algorithms for common data mining methods such as

Pre-processing and post-processing operations like sampling, feature selection, data reduction and transformation, rule grouping and pruning, etc.
Incremental, exploratory and interactive mining
Meta-mining, coping with distributed and/or heterogeneous datasets.
Integration of mining with parallel/distributed databases and

Mining non-traditional datasets, such as large scientific databases.
Frameworks for KDD systems, and parallel or distributed mining.
Agent based approaches for PDDM.
Applications of PDDM in business, science, engineering, medicine, and other disciplines.
Theoretical foundation of PDDM.

WORKSHOP SCHEDULE:

9:00 - 9:15 Opening Remarks
9:15 -10:00 Keynote Talk
10:00-10:30 Coffee Break
10:30-12:00 Session I
12:00-13:30 Lunch
13:30-14:15 Invited Talk
14:15-15:15 Session II
15:15-15:20 Concluding Remarks
15:20-15:30 Coffee Break

SESSION INFORMATION:

Keynote Talk: Scalable Parallel Data Mining for High-Dimensional Data, Alok Choudhary, Northwestern University (Speaker Bio)
Abstract: Large-scale Data analysis and data mining on warehouses (where huge amount of time-varying observational, transactional or simulation data is stored) pose many challenges. The data stored is typically multidimensional with large number of dimensions. In many cases, the data is highly sparse. Parallel processing techniques have become important to enable the use of larger data sets and reduce the time for analysis and knowledge discovery. In this talk, I will briefly present PARSIMONY, a system which provides an infrastructure as well as scalable algorithms for analysis and mining of large and multidimensional data. In particular, I will present MAFIA, a scalable parallel clustering algorithm for large dimensional data.

Session I:

Efficient Data Mining: Scripting and Scalable Parallel Algorithms, Peter Christen, Markus Hegland, Ole M. Nielsen, Stephen Roberts, Peter Strazdins, Tatiana Semenova, Irfan Altas

An efficient association mining implementation on clusters of SMP, Ruoming Jin and Gagan Agrawal

Implementation and performance evaluation of dynamic scheduling for parallel decision tree generation, Kazuto Kubota, Akihiko Nakase and Shigeru Oyanagi

Invited Talk: Ubiquitous Mining of Distributed Data, Hillol Kargupta, University of Maryland Baltimore County (Speaker Bio)
Abstract: Knowledge discovery and data mining deal with the problem of extracting interesting associations, classifiers, clusters, and other patterns from data. The emergence of network-based environments has introduced a new important dimension to this problem--distributed sources of data and computing. The advent of laptops, palmtops, handhelds, and wearable computers is making ubiquitous access to large quantity of distributed data a reality. Advanced analysis of distributed data for extracting useful knowledge is the next natural step in the increasingly connected world of ubiquitous computing. However, this will not come for free; it will introduce additional cost due to communication, computation, security among others. Distributed data mining (DDM) offers the capability to analyze distributed data by minimizing this cost to maintain the ubiquitous presence. This talk will explain the Collective Data Mining (CDM) approach to DDM that offers a collection of different scalable distributed data analysis techniques. It will present an overview of the CDM technology and its applications.

Session II:

Towards Network-Aware Data Mining, Srinivasan Parthasarathy

Incremental Quantitative Rule Derivation by Multidimensional Data Partitioning, Junping Sun

Maintained by: Mohammed J. Zaki <zaki.AT.cs.rpi.edu>

You are visitor You are visitor

PDDM, 2001

4th International Workshop on Parallel and Distributed Data Mining April 27, 2001 San Francisco, CA, USA

15th International Parallel and Distributed Processing Symposium(IPDPS'2001)

4th International
Workshop on Parallel and Distributed Data Mining
April 27, 2001
San Francisco, CA, USA