Workshop on Large-Scale Parallel KDD Systems
August 15th, 1999, San Diego, CA, USA
in conjunction with
ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD99)

Online Proceedings on ACM Server

With the unprecedented rate at which data is being collected today in almost all fields of human endeavor, there is an emerging economic and scientific need to extract useful information from it. Many companies already have data warehouses in the terabyte range (e.g., FedEx, UPS, Walmart, etc.). Implementation of data mining ideas in high-performance parallel and distributed computing environments is thus crucial for ensuring system scalability and interactivity.

The goal of this workshop is to bring researchers and practitioners together in a setting where they can discuss the design, implementation, and deployment of large-scale, parallel knowledge discovery (PKD) systems, which can manipulate data taken from very large enterprise or scientific (e.g., space missions, human genome project, etc.) databases, regardless of whether the data are located centrally or are globally distributed. Relevant topics for the workshop include:

  1. How to develop a rapid-response, scalable, and parallel knowledge discovery system that supports global organizations with terabytes of data.
  2. How to address some of the challenges facing current state-of-the-art data mining tools. These challenges include relieving the user from time and volume constrained tool-sets, evolving knowledge stores with new knowledge effectively, acquiring data elements from heterogeneous sources such as the Web or other repositories, and enhancing the PKD process by incrementally updating the knowledge stores.
  3. How to leverage high performance parallel and distributed techniques in all the phases of KDD, such as initial data selection, cleaning and preprocessing, transformation, data-mining task and algorithm selection and its application, pattern evaluation, management of discovered knowledge, and providing tight coupling between the mining engine and database/file server.
  4. How to facilitate user interaction and usability, allowing the representation of domain knowledge, and to maximize understanding during and after the process. That is, how to build an adaptable PKD engine which supports business decisions, product creation and evolution, and leverages information into usable or actionable knowledge.

Online Proceedings

8:40am, Opening

8:45-9:25am, Invited Talk: Collection-Based Data Management
Reagan Moore, San Diego Supercomputer Center, USA

9:25-10:15am, Session I: Mining Frameworks
10:15-10:30am, Coffee Break

10:30-11:10am, Invited Talk: Large-Scale Data Mining Applications: Requirements and Architectures
Umeshwar Dayal, Hewlitt-Packard Corp., USA

11:10-12:00am, Session II: Association Rules
12:00-2:00pm, Lunch Break

2:00-2:40pm, Invited Talk: Integrated Delivery of Large-Scale Data Mining Systems
Graham Williams, CSIRO, Australia

2:40-3:30pm, Session III: Clustering and Sequences
3:30-3:45pm, Coffee break

3:45-4:25pm, Invited Talk: Communicating Data Mining: Issues and Challenges in Wide Area Distributed Data Mining
Bob Grossman and Yike Guo, University of Illinois-Chicago, USA and Imperial College, UK

4:25-5:15pm, Session IV: Classification
5:15-6:15pm, Panel: Large-Scale Data Mining: Where is it Headed?

6:15pm, Closing


All registrants to the SIGKDD conference are eligible to participate in the workshop. There is no separate registration fee for the workshop, but the workshop attendance is by invitation only, and the number of partcipants in the workshop will be limited to 60.

To register for the workshop please send an email to one of the workshop chairs, expressing your interest in the workshop. A brief statement of your research/work interests will be helpful. Of course, you must also register for the main SIGKDD conference. A list of the registrants so far is available List of Registrants .

Workshop Chairs: (biographies)

Dr. Mohammed J. Zaki
Computer Science Department
Rensselaer Polytechnic Institute
Troy NY 12180

Dr. Ching-Tien (Howard) Ho
IBM Almaden Research Center
650 Harry Road
San Jose CA 95120

Program Committee:

David Cheung, University of Hong Kong, Hong Kong
Alok Choudhary, Northwestern University
Alex A. Freitas, PUC-PR, Brazil
Robert Grossman, University of Illinois-Chicago
Yike Guo, Imperial College, UK
Hillol Kargupta, Washington State University
Masaru Kitsuregawa, University of Tokyo, Japan
Vipin Kumar, University of Minnesota
Reagan Moore, San Diego Supercomputer Center
Ron Musick, Lawrence Livermore National Lab
Srini Parthasarathy, University of Rochester
Sanjay Ranka, University of Florida
Arno Siebes, CWI, Netherlands
David Skillicorn, Queens University, Canada
Paul Stolorz, Jet Propulsion Lab
Graham Williams, CSIRO, Australia

 Number of Visitors