WORKSHOP CO-CHAIRS:

Mohammed J. Zaki

Rensselaer Polytechnic Institute (zaki.AT.cs.rpi.edu )

Hannu T.T. Toivonen

University of Helsinki and Nokia Research Center (Hannu.TT.Toivonen@nokia.com)

Jason T. L. Wang

New Jersey Institute of Technology (jason@cis.njit.edu)

PROGRAM COMMITTEE:

Chuck Baldwin, Lawrence Livermore National Laboratory

Chris Bystroff, Rensselaer Polytechnic Institute

Shi-Kuo Chang, University of Pittsburgh

Wesley W. Chu, University of California, Los Angeles

Diane J. Cook, University of Texas at Arlington

Charles Elkan, University of California, San Diego

Janice Glasgow, Queen's University, Canada

Richard Hughey, University of California, Santa Cruz

Hasan Jamil, Mississippi State University

Minoru Kanehisa, Kyoto University

Simon M. Lin, Duke University Medical Center

Jacob V. Maizel, Jr., National Institutes of Health

Sharad Mehrotra, University of California at Irvine

Shinichi Morishita, University of Tokyo

Jane Richardson, Duke University

Isidore Rigoutsos, IBM Thomas J. Watson Research Center

Bruce Shapiro, National Institutes of Health

Vassilis J. Tsotras, University of California, Riverside

Alex Tuzhilin, New York University/Stern School of Business

Jeff Vitter, Duke University

Cathy H. Wu, Georgetown University Medical Center

Michael Zucker, Rensselaer Polytechnic Institute

BIOKDD, 2001

Workshop on Data Mining in Bioinformatics

in conjunction with

7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 26-29, 2001
San Francisco, CA, USA
(KDD'2001)

Bioinformatics is the science of storing, extracting, organizing, analyzing, interpreting, and utilizing information from biological sequences and molecules. It has been mainly fueled by advances in DNA sequencing and mapping techniques. The Human Genome Project has resulted in an exponentially growing database of genetic sequences. Knowledge Discovery and Data mining (KDD) techniques will play an increasingly important role in the analysis and discovery of sequence, structure and functional patterns or models from large sequence databases. High performance techniques are also becoming central to this task.

Bioinformatics provides opportunities for developing novel mining methods. Some of the grand challenges in bioinformatics include protein structure prediction, homology search, multiple alignment and phylogeny construction, genomic sequence analysis, gene finding and gene mapping, as well as applications in gene expression data analysis, drug discovery in pharmaceutical industry, etc. In protein structure prediction, one is interested in determining the secondary, tertiary and quaternary structure of proteins, given their amino acid sequence. Homology search aims at detecting increasingly distant homologues, i.e., proteins related by evolution from a common ancestor. Multiple alignment and phylogenetic tree construction are inter-related problems. Multiple alignment aims at aligning a whole set of sequences to determine which subsequences are conserved. This works best when a phylogenetic tree of related proteins is available. Gene finding aims at locating the genes in a DNA sequence. Finally, in gene mapping the task is to identify potential gene loci for a particular disease, typically based on genetic marker data from patients and controls.

WORKSHOP TOPICS

We solicit papers with important new insights and experiences on knowledge discovery and data mining from the modeling and simulation of complex biological systems. Topics of interest lie at the intersection of KDD and Bioinformatics. They include, but are not limited to, the following:

Knowledge discovery and data mining:

New Mining Algorithms

Knowledge Representation

Database Support

Data Preprocessing and Cleaning

Feature Selection, Analysis and Visualization

Machine Learning and Pattern Recognition

Neural, Rough, Fuzzy and Hybrid Techniques

Hidden Markov Models

Bayesian Approaches

High Performance Computing

Bioinformatics:

Molecular Sequence Analysis

Recognition of Genes and Regulatory Elements

Protein Structure Prediction

Interpretation of Large-Scale Gene Expression Data

Gene Mapping

Whole Genome Comparative Analysis

Modeling of Biochemical Pathways

Drug Design and Combinatorial Libraries

Special Issue: Authors submitting papers to this workshop are also encouraged to submit papers for an independent review and possible publication in the forthcoming special issue on Bioinformatics and Biological Data Management of Information Systems.

Important Dates

May 15, 2001: Submissions Due
June 15, 2001: Acceptance Notification
July 16, 2001:Camera Ready Copy Due
August 26,2001: Workshop Day

Paper Format

Submissions on the above and related topics of bioinformatics and data mining are invited. We also encourage submissions, which present early stages of research work, software applications and solutions. Papers should not be more than 10 pages in 10 point font and single-spaced, with one-inch margins on all sides Contact author and email address should be specified on the title page.

Electronic Submission

Electronic submission either in PDF or PS format are strongly encouraged.

Please e-mail electronic submissions with subject "BIOKDD2001" to:

zaki.AT.cs.rpi.edu

Hard Copy Submission

If electronic submission is not possible send 5 hardcopies to:
Mohammed J. Zaki
Department of Computer Science
Rensselaer Polytechnic Institute
Troy NY 12180
USA

Maintained by: Mohammed J. Zaki <zaki@cs.rpi.edu>

You are visitor You are visitor