BIOKDD, 2001

Workshop on Data Mining in Bioinformatics

August 26, 2001
San Francisco, CA, USA

in conjunction with

7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Bioinformatics is the science of storing, extracting, organizing, analyzing, interpreting, and utilizing information from biological sequences and molecules. Knowledge Discovery and Data mining (KDD) techniques will play an increasingly important role in the analysis and discovery of sequence, structure and functional patterns or models from large sequence databases. 

This workshop aims to present latest results in this exciting area at the intersection of biology and KDD. More detailed topics of interest are available at  CALL FOR PAPERS.

Preliminary proceedings  will be published by ACM for distribution during the workshop. A post-workshop book is being planned with Springer-Verlag, containing revised versions of the workshop papers as well as invited chapters from other leading researchers.


9:00-9:05: Opening Remarks

9:05-10:05 Session I (Invited Talk & Gene Expression)

  • (40 mins) Invited Talk: Determination of RNA folding pathway functional intermediates using a massively parallel genetic algorithm Bruce Shapiro, National Cancer Institute, USA
  • (20 mins) Extracting knowledge from gene expression data: A case study of Batten Disease, Simon Lin, Sumeer Dhar, Rose-Mary Boustany,  Duke University Medical Center, USA
  • 10:05-10:30 Coffee Break

    10:30-11:30 Session II (Microarrays)
    (20 mins each)

  • Mining microarray expression data for classifier gene-cores, Goutham Kurra, Wen Niu and Raj Bhatnagar, University of Cincinnati, USA
  • Classification of genes using probabilistic models of microarray expression profiles, Paul Pavlidis, Christopher Tang, William S. Noble, Columbia University, USA
  • Analysis of an associative memory neural network for pattern identification in gene expression data,Silvio Bicciato, Mario Pandin, Giuseppe Didone' and Carlo Di Bello, University of Padova and Cittadella Hospital, Italy
  • 11:30-12:10 Session III (Sequence Assembly)
    (20 mins each)

  • A learning algorithm for string assembly, Mark Goldberg, Darren Lim, and Malik Magdon-Ismail, RPI, USA
  • A probabilistic approach to sequence assembly validation, Sun Kim, Li Liao, and Jean-Francois Tomb, DuPont, USA
  • 12:10-1:30 Lunch

    1:30-2:50 Session IV: (Invited Talk & Proteins)

  • (40 mins) Invited Talk: Shared challenges in data mining and computational biology, Charles Elkan, University of California, San Diego, USA

  • (20 mins each)

  • Learning to recognize brain specific proteins based on low-level features from on-line prediction servers, Mikael Huss, Henrik Bostrom, Lars Asker and Joakim Coster, Vitrual Genetics Laboratory, Sweden
  • Investigation of bagging-like effects and decision trees versus neural nets in protein secondary structure prediction, Nitesh Chawla, Thomas Moore, Kevin Bowyer, Lawrence Hall, Clayton Springer, Philip Kegelmeyer, University of South Florida and Sandia National Labs., USA
  • 2:50-3:00 Break (uncatered)

    3:00-4:00 Session V (Sequence Modeling & Clustering)
    (20 mins each) 

  • Maximum entropy methods for biological sequence modeling, Eugen Buehler, Lyle Ungar, University of Pennsylvania, USA
  • Hierarchical cluster analysis of SAGE data for cancer profiling, Raymond T. Ng, Jorg Sander, and Monica C. Sleumer, University of British Columbia, Canada 
  • A scalable algorithm for clustering protein sequences, Valery Guralnik, and George Karypis, University of Minnesota, USA
  • 4:00-4:05: Closing Remarks


  • Mohammed J. Zaki, Rensselaer Polytechnic Institute ( )
  • Hannu T.T. Toivonen, University of Helsinki and Nokia Research Center (
  • Jason T. L. Wang, New Jersey Institute of Technology (


  • Chuck Baldwin, Lawrence Livermore National Laboratory 
  • Chris Bystroff, Rensselaer Polytechnic Institute 
  • Shi-Kuo Chang, University of Pittsburgh 
  • Wesley W. Chu, University of California, Los Angeles 
  • Diane J. Cook, University of Texas at Arlington 
  • Charles Elkan, University of California, San Diego 
  • Janice Glasgow, Queen's University, Canada 
  • Richard Hughey, University of California, Santa Cruz 
  • Hasan Jamil, Mississippi State University 
  • Minoru Kanehisa, Kyoto University 
  • Simon M. Lin, Duke University Medical Center 
  • Jacob V. Maizel, Jr., National Institutes of Health 
  • Sharad Mehrotra, University of California at Irvine 
  • Shinichi Morishita, University of Tokyo 
  • Jane Richardson, Duke University 
  • Isidore Rigoutsos, IBM Thomas J. Watson Research Center 
  • Bruce Shapiro, National Institutes of Health 
  • Vassilis J. Tsotras, University of California, Riverside 
  • Alex Tuzhilin, New York University/Stern School of Business 
  • Jeff Vitter, Duke University 
  • Cathy H. Wu, Georgetown University Medical Center 
  • Michael Zucker, Rensselaer Polytechnic Institute 

  • Maintained by: Mohammed J. Zaki <>
    You are visitor