Workshop on Large-Scale Parallel KDD Systems

August 15th, 1999, San Diego, CA, USA

in conjunction with

ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining (KDD99)

With the unprecedented rate at which data is being collected today in almost all fields of human endeavor, there is an emerging economic and scientific need to extract useful information from it. Many companies already have data warehouses in the terabyte range (e.g., FedEx, UPS, Walmart, etc.). Implementation of data mining ideas in high-performance parallel and distributed computing environments is thus crucial for ensuring system scalability and interactivity.

The goal of this workshop is to bring researchers and practitioners together in a setting where they can discuss the design, implementation, and deployment of large-scale, parallel knowledge discovery (PKD) systems, which can manipulate data taken from very large enterprise or scientific (e.g., space missions, human genome project, etc.) databases, regardless of whether the data are located centrally or are globally distributed. Relevant topics for the workshop include:

How to develop a rapid-response, scalable, and parallel knowledge discovery system that supports global organizations with terabytes of data.
How to address some of the challenges facing current state-of-the-art data mining tools. These challenges include relieving the user from time and volume constrained tool-sets, evolving knowledge stores with new knowledge effectively, acquiring data elements from heterogeneous sources such as the Web or other repositories, and enhancing the PKD process by incrementally updating the knowledge stores.
How to leverage high performance parallel and distributed techniques in all the phases of KDD, such as initial data selection, cleaning and preprocessing, transformation, data-mining task and algorithm selection and its application, pattern evaluation, management of discovered knowledge, and providing tight coupling between the mining engine and database/file server.
How to facilitate user interaction and usability, allowing the representation of domain knowledge, and to maximize understanding during and after the process. That is, how to build an adaptable PKD engine which supports business decisions, product creation and evolution, and leverages information into usable or actionable knowledge.

Online Proceedings

8:40am, Opening
8:45-9:25am, Invited Talk: Collection-Based Data Management
Reagan Moore, San Diego Supercomputer Center, USA
9:25-10:15am, Session I: Mining Frameworks

A high performance implementation of the data space transfer protocol (DSTP)

S. Bailey, E. Creel, R. Grossman, S. Gutti, H. Sivakumar , University of Illinois-Chicago, USA

Active data mining in a distributed setting

S. Parthasarathy, S. Dwarkadas, M. Ogihara, University of Rochester, USA
10:15-10:30am, Coffee Break
10:30-11:10am, Invited Talk: Large-Scale Data Mining Applications: Requirements and Architectures
Umeshwar Dayal, Hewlitt-Packard Corp., USA
11:10-12:00am, Session II: Association Rules

Parallel branch-and-bound graph search for correlated association rules

S. Morishita, A. Nakaya, University of Tokyo, Japan

Parallel algorithms for mining association rule mining on large scale PC Cluster

T. Shintani and M. Kitsuregawa, University of Tokyo, Japan
12:00-2:00pm, Lunch Break
2:00-2:40pm, Invited Talk: Integrated Delivery of Large-Scale Data Mining Systems
Graham Williams, CSIRO, Australia
2:40-3:30pm, Session III: Clustering and Sequences

A data clustering algorithm on distributed memory machines

I. S. Dhillon and D. S. Modha , IBM Almaden Research Center, USA

Parallel sequence mining on SMP machines

M. Zaki, Rensselaer Polytechnic Institute, USA
3:30-3:45pm, Coffee break
3:45-4:25pm, Invited Talk: Communicating Data Mining: Issues and Challenges in Wide Area Distributed Data Mining
Bob Grossman and Yike Guo, University of Illinois-Chicago, USA and Imperial College, UK
4:25-5:15pm, Session IV: Classification

Efficient parallel classification using dimensional aggregates

S. Goil and A. Choudhary , Northwestern University, USA

Learning rules from distributed data

L.O. Hall, N. Chawla, K.W. Bowyer, and W. P. Kegelmeyer, University of South Florida and Sandia National Labs., USA
5:15-6:15pm, Panel: Large-Scale Data Mining: Where is it Headed?

Vipin Kumar, University of Minnesota
Ron Musick, Lawrence Livermore National Labs
Foster Provost, Bell Atlantic
Mohammed Zaki (moderator), Rensselaer Polytechnic Institute
6:15pm, Closing

Registration:

All registrants to the SIGKDD conference are eligible to participate in the workshop. There is no separate registration fee for the workshop, but the workshop attendance is by invitation only, and the number of partcipants in the workshop will be limited to 60.

To register for the workshop please send an email to one of the workshop chairs, expressing your interest in the workshop. A brief statement of your research/work interests will be helpful. Of course, you must also register for the main SIGKDD conference. A list of the registrants so far is available List of Registrants .

Workshop Chairs: (biographies)

Dr. Mohammed J. Zaki
Computer Science Department
Rensselaer Polytechnic Institute
Troy NY 12180
zaki.AT.cs.rpi.edu

Dr. Ching-Tien (Howard) Ho
IBM Almaden Research Center
650 Harry Road
San Jose CA 95120
ho@almaden.ibm.com

Program Committee:

David Cheung, University of Hong Kong, Hong Kong
Alok Choudhary, Northwestern University
Alex A. Freitas, PUC-PR, Brazil
Robert Grossman, University of Illinois-Chicago
Yike Guo, Imperial College, UK
Hillol Kargupta, Washington State University
Masaru Kitsuregawa, University of Tokyo, Japan
Vipin Kumar, University of Minnesota
Reagan Moore, San Diego Supercomputer Center
Ron Musick, Lawrence Livermore National Lab
Srini Parthasarathy, University of Rochester
Sanjay Ranka, University of Florida
Arno Siebes, CWI, Netherlands
David Skillicorn, Queens University, Canada
Paul Stolorz, Jet Propulsion Lab
Graham Williams, CSIRO, Australia

Online Proceedings

Registration:

Workshop Chairs: (biographies)

Program Committee:

Number of Visitors