Special Issue on Parallel and Distributed Data Mining
( Distributed and Parallel Databases: An International Journal)


Guest Editors

Mohammed J. Zaki
Computer Science Department
Rensselaer Polytechnic Institute
Troy, NY 12180
e-mail: zaki.AT.cs.rpi.edu
URL: www.cs.rpi.edu/~zaki

Yi Pan
Department of Computer Science
Georgia State University
University Plaza
Atlanta, GA 30303
e-mail: pan@cs.gsu.edu
URL: http://www.cs.gsu.edu/~cscyip/

This special has been published, see Distributed and Parallel Databases (Vol. 11, No. 2, March 2002) for contents.


Parallel and Distributed Data Mining


With the unprecedented growth-rate at which data is being collected and stored electronically today in almost all fields of human endeavor, the efficient mining of useful information from the data available is becoming an increasing scientific challenge and a massive economic need.

While data mining has its roots in the traditional fields of machine learning and statistics, the sheer volume of data today poses the most serious problem. For example, many companies already have data warehouses in the terabyte range (e.g., FedEx, UPS, Walmart). Similarly, scientific data is reaching gigantic proportions (e.g., NASA space missions, Human Genome Project). Traditional methods typically made the assumption that the data is memory resident. This assumption is no longer tenable. Implementation of data mining ideas in high-performance parallel and distributed computing environments is thus becoming crucial for ensuring system scalability and interactivity as data continues to grow inexorably in size and complexity.

This special issue of Distributed and Parallel Databases will address data mining methods and processes from both an algorithmic and systems perspective.

The algorithmic aspects involve the design of efficient, scalable, disk-based, parallel and distributed algorithms for large-scale data mining tasks. The challenge is to develop methods that scale to thousands of attributes and billions of transactions. The techniques of interest span all major classes of data mining methods such as association rules, sequences, classification, clustering, deviation detection, as well as various pre-processing and post-processing operations like sampling, feature selection, data reduction and transformation, rule grouping and pruning, exploratory and interactive browsing, meta-level mining, etc.

The systems issues will focus on actual implementation of the algorithms on a variety of parallel hardware platforms, including shared-memory systems (SMPs), distributed-memory systems, network of workstations, hybrid systems consisting of a cluster of SMPs, geographically distributed systems, etc. The key challenges include improving the load balancing, improving locality, eliminating false sharing on SMPs, minimizing synchronization, minimizing communication, maximizing accuracy of distributed models, integrating heterogeneous sources, and finding appropriate data layouts. Papers dealing with intergation of mining with databases and datawarehousing, as well as successful applications, are also sought.


Submission Instructions


Authors are encouraged to submit high quality, original work that has neither appeared in, nor is under consideration by, other journals. Submissions should be in 12pt font, 1.5 line-spacing, and should not exceed 30 pages (including all figures, tables, and references).

Authors must submit 5 hardcopies of the paper to the kluwer office directly. The cover letter must include the title of the special issue and the title of the journal. Send the hard copy to:
Kluwer Academic Publishers,
Journals Editorial Office,
101 Philip Drive, Assinippi Park,
Norwell, MA 02061, U.S.A.

Important Dates:

Papers Due: February 12th, 2001
Acceptance Notification: April 30th, 2001
Camera Ready Papers Due: June 15th, 2001
 


Number of Visitors