• Mohammed J. Zaki
    Rensselaer Polytechnic Institute ( )
  • Jason T. L. Wang
    New Jersey Institute of Technology ( )
  • Hannu T.T. Toivonen
    University of Helsinki ( )

  • Henrik Bostrom, Stockholm University/Virtual Genetics, Sweden
  • Julie Dickerson, Iowa State University
  • Mark Embrechts, Rensselaer Polytechnic Institute
  • Hasan Jamil, Mississipi State University
  • George Karypis, University of Minnesota
  • Sun Kim, Indiana University
  • Simon Lin, Duke University
  • Hiroshi Mamitsuka, Kyoto University
  • Shinichi Morishita, University of Tokyo, Japan
  • William S. Noble, Columbia University
  • Zoran Obradovic, Temple University
  • David Page, University of Wisconsin
  • Laxmi Parida, IBM T.J. Watson Research Center
  • Srinivasan Parthasarathy, Ohio State University
  • William H. Piel, University at Buffalo, USA
  • Joerg Sander, University of Alberta, Canada
  • Bruce Shapiro, National Cancer Institute
  • Limsoon Wong, Labs for Information Technology, Singapore
  • Cathy Wu, Georgetown University Medical Center
  • Aidong Zhang, State University of New York at Buffalo

  • BIOKDD, 2002

    2nd Workshop on Data Mining in Bioinformatics

    in conjunction with

    8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    July 23-26, 2002
    Edmonton, Alberta, Canada

    Preliminary Program is now Available

    Bioinformatics is the science of storing, extracting, organizing, analyzing, interpreting, and utilizing information from biological sequences and molecules. It has been mainly fueled by advances in DNA sequencing and mapping techniques. The Human Genome Project has resulted in an exponentially growing database of genetic sequences. Knowledge Discovery and Data mining (KDD) techniques will play an increasingly important role in the analysis and discovery of sequence, structure and functional patterns or models from large sequence databases. High performance techniques are also becoming central to this task.

    Bioinformatics provides opportunities for developing novel mining methods. Some of the grand challenges in bioinformatics include protein structure prediction, homology search, multiple alignment and phylogeny construction, genomic sequence analysis, gene finding and gene mapping, as well as applications in gene expression data analysis, drug discovery in pharmaceutical industry, etc. In protein structure prediction, one is interested in determining the secondary, tertiary and quaternary structure of proteins, given their amino acid sequence. Homology search aims at detecting increasingly distant homologues, i.e., proteins related by evolution from a common ancestor. Multiple alignment and phylogenetic tree construction are inter-related problems. Multiple alignment aims at aligning a whole set of sequences to determine which subsequences are conserved. This works best when a phylogenetic tree of related proteins is available. Gene finding aims at locating the genes in a DNA sequence. Finally, in gene mapping the task is to identify potential gene loci for a particular disease, typically based on genetic marker data from patients and controls.


    We solicit papers with important new insights and experiences on knowledge discovery and data mining from the modeling and simulation of complex biological systems. The workshop co-chairs are currently negotiating with a major publisher to publish a book containing selected chapters from BIOKDD02 and BIOKDD01 workshops. Topics of interest lie at the intersection of KDD and Bioinformatics. They include, but are not limited to, the following:

    Knowledge discovery and data mining:

  • New Mining Algorithms
  • Knowledge Representation
  • Database Support
  • Data Preprocessing and Cleaning
  • Feature Selection, Analysis and Visualization
  • Machine Learning and Pattern Recognition
  • Neural, Rough, Fuzzy and Hybrid Techniques
  • Hidden Markov Models
  • Bayesian Approaches
  • High Performance Computing
  • Bioinformatics:

  • Molecular Sequence Analysis
  • Recognition of Genes and Regulatory Elements
  • Protein Structure Prediction
  • Interpretation of Large-Scale Gene Expression Data
  • Gene Mapping
  • Whole Genome Comparative Analysis
  • Modeling of Biochemical Pathways
  • Drug Design and Combinatorial Libraries
  • Important Dates

    May 15, 2002: Submissions Due
    June 15, 2002: Acceptance Notification
    June 22, 2002:Camera Ready Copy Due
    July 23, 2002: Workshop Day

    Paper Format

    Submissions on the above and related topics of bioinformatics and data mining are invited. We also encourage submissions, which present early stages of research work, software applications and solutions. Papers should not be more than 10 pages in 10 point font and single-spaced, with one-inch margins on all sides Contact author and email address should be specified on the title page.

    Electronic Submission

    Electronic submission either in PDF or PS format are strongly encouraged.

    Please e-mail electronic submissions with subject "BIOKDD2002" to:

    Hard Copy Submission

    If electronic submission is not possible send 5 hardcopies to:
    Jason Wang
    College of Computing Sciences,
    New Jersey Institute of Technology,
    Newark, NJ 07102 USA

    Maintained by: Mohammed J. Zaki < >
    You are visitor You are visitor