ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'07)

7th International Workshop on Data Mining in Bioinformatics (BIOKDD '07)
August 12th 2007 * San Jose, CA, USA

BIOKDD '07 workshop was successfully completed on Aug 12, 2007. You may find the electronic proceedings here.

We are editing a new book "BIOKDD: Knowledge Discovery and Data Mining in Biology", predominately based on extended work selected among previous BIOKDD contributors. However, if you can write well to a lay audience, have a unique computational technique that works, or want to share unique large-scale discocvery results using pure computational techniques, you may consider contributing a BIOKDD book chapter. If so, please send us your presubmission inquiry by email to us ASAP.


Bioinformatics is the science of managing, mining, and interpreting information from biological data. Various genome projects have contributed to an exponential growth in DNA and protein sequence databases. Advances in high-throughput technology such as microarrays and mass spectrometry have further created the fields of functional genomics and proteomics, in which one can monitor quantitatively the presence of multiple genes, proteins, metabolites, and compounds in a given biological state. The ongoing influx of these data, the presence of biological answers to data observed despite noises, and the gap between data collection and knowledge curation have collectively created exciting opportunities for data mining researchers.

While tremendous progress has been made over the years, many of the fundamental problems in bioinformatics, such as protein structure prediction, gene-environment interaction, and regulatory pathway mapping, are still open. Data mining will play essential roles in understanding these fundamental problems and development of novel therapeutic/diagnostic solutions in post-genome medicine.

Workshop History (2001-2006)

Data Mining approaches seem ideally suited for Bioinformatics, since it is data-rich, but lacks a comprehensive theory of life's organization at the molecular level. The extensive databases of biological information create both challenges and opportunities for developing novel KDD methods. To highlight these avenues we organized the Workshops on Data Mining in Bioinformatics (BIOKDD 2001-2006), held annually in conjunction with the ACM SIGKDD Conference.
This will be the 7th year for the workshop.

Past workshops attracted 50-100 participants, from academia, industry and government labs, underscoring the surge of interest in this exciting and rapidly expanding field.  The program of the workshops included 10-11 contributed papers, and 1-2 invited talks.
Information on past workshops is available at:

Call for Papers

The goal of this workshop is to encourage KDD researchers to take on the numerous challenges that Bioinformatics offers. The workshop will feature invited talks from noted experts in the field, and the latest data mining research in bioinformatics. We encourage papers that propose novel data mining techniques for post-genome bioinformatics studies in areas such as:

  • Phylogenetics and comparative Genomics
  • DNA microarray data analysis
  • RNAi and microRNA Analysis
  • Protein/RNA structure prediction
  • Sequence and structural motif finding
  • Modeling of biological networks and pathways
  • Statistical learning methods in bioinformatics
  • Computational proteomics
  • Computational biomarker discoveries
  • Computational drug discoveries
  • Biomedical text mining
  • Biological data management techniques
  • Semantic webs and ontology-driven biological data integration methods

Papers should be at most 10 pages long, single-spaced, in font size 10 or larger with one-inch margins on all sides.  Paper in PDF/PS format can be sent to both of the co-chairs by email. Camera-ready format papers may be referenced from previous BIOKDD conference proceedings (e.g., BIOKDD06)

Important Dates

5/25/2007      Submission of Papers
6/23/2007      Notification of Acceptance; Workshop Registration Open
7/14/2007      Submission of Camera Ready Papers
8/12/2007      Full-day Workshop Presentation


Submission of accepted papers. For accepted workshop papers, we require that each camera-ready paper be formatted strictly according to the official ACM Proceedings Format. Please submit PDF file only. To prepare for the camera-ready PDF file submission, you may use either the Microsoft word template or the Latex files preparation instructions found here. All final camera-ready submissions must be accompanied by a completed digital copy (scanned Okay) of the ACM copyright transfer form, or else the paper cannot be included in the final workshop proceedings.

Publication of proceeding and expanded papers. You may read the workshop editorial (small pdf file) here and the full workshop proceeding (large pdf file) is also made available online here. Expanded version of selected high-quality papers from the workshop will be invited for publication in a special issue (late spring/summer 2008) of Journal of Bioinformatics and Computational Biology (JBCB). Details of the journal/book publication will be announced after the workshop and this web site:

Program Overview

Duration: 1 FULL DAY (08/12/07)

Location: BIOKDD '07 will be held in conjuction with ACM KDD 2007, at the Fairmont San Jose Hotel in San Jose, California, USA. The following is the contact information for the hotel:

Fairmont Hotel
170 South Market Street
San Jose, CA, 95113
Tel: (408) 998-1900
Fax: (408) 287-1648

Keynote Speaker : Atul Butte, MD/PhD, Stanford University "Exploring Genomic Medicine Using Integrative Biology"



  1. The workshop registration is required for each accepted paper. The fee covers hospitalities and administrative expenses related to the successful organization of the workshop. The registration fee is $60 for each workshop paper presenter (without printed proceedings), or $80 for each workshop paper presentation (with printed proceedings), or $60 for official full-day participant who needs printed copies of the workshop proceedings. For those who are not presenting and who do not intend to participate the full day event, official registration fee is not required but recommended.
  2. The registration is now open as of July 25th 2007 and will close on August 10th 2007.
  3. Please also note that ACM SIGKDD '07 conference has a separate registration process for those interested in the whole ACM KDD conference event. The conference registration ($700), however, will not be required for participation in this workshop.

To register officially for the workshop, please use the following Google Checkout to pay the fees.

Workshop Schedule

Note: the allocated time includes presentation time, Q&A time (5min), and transition time from one speaker to the next (2min).

8:50-9:00am: Opening Remarks

Session 1.
9:00-9:30am: Talk 1
• “Gene Selection by Matrix Reordering and Replicator Dynamics”, Wenyuan Li, Xiuwen Zheng, and Ying Liu, University of Texas at Dallas and University of Washington.
9:30-10:00am: Talk 2
“Investigating the Use of Extrinsic Similarity Measures for Microarray Analysis”, Duygu Ucar, F. Altiparmak, H. Ferhatosmanoglu, and Srinivasan Parthasarathy, The Ohio State University.

10:00-10:30am: Coffee Break

Session 2.
10:30-11:00am: Talk 3
“Mining Over-Represented 3D Patterns of Secondary Structures in Proteins”, Matteo Comin, Concettina Guerra and Giuseppe Zanotti, University of Padova, Italy and Georgia Institute of Technology.
11:00-12:00am: Invited Talk
“Exploring Genomic Medicine Using Integrative Biology”, Atul Butte, Stanford University School of Medicine and the Lucile Packard Children's Hospital.

12:00-1:30pm: Lunch

Session 3.
1:30-2:00pm: Talk 4
• “Combining Domain Fusions and Domain-Domain Interactions to Predict Protein-Protein Interactions”, Nguyen Thanh Phuong and Tu Bao Ho, Japan Advanced Institute of Science and Technology.
2:00-2:30pm: Talk 5
“A Linear-time Algorithm for Predicting Functional Annotations from Protein-Protein Interaction Networks”, Yonghui Wu and Stefano Lonardi, University of California, Riverside.
2:30-3:00pm: Talk 6
“Profile-feature based Protein Interaction Extraction from Full-Text Articles”, Shilin Ding, Minlie Huang, Hongning Wang, and Xiaoyan Zhu, Tsinghua University, China.
3:00-3:30pm: Talk 7
“A Decomposition Approach for Discovering Network Building Blocks”, Qiaofeng Yang and Stefano Lonardi, Lawrence Berkeley National Laboratory and University of California, Riverside.

3:30-4:00pm: Coffee Break

Session 4.
4:00-4:20pm: Short Talk 1
• “Use of Gene Ontology as a Tool for Assessment of Analytical Algorithms with Real Data Sets: Impact of Revised Affymetrix CDF Annotation”, Megan Kong, Zhongxue Chen, Yu Qian, Jennifer Cai, Jamie Lee, Eva Rab, Monnie McGee, and Richard H. Scheuermann, University of Texas Southwestern Medical Center and Southern Methodist University.
4:20-4:40pm: Short Talk 2
“Clustering of Non-Alignable Protein Sequences”, Abdellali Kelil, Shengrui Wang, Ryszard Brzezinski,  University of Sherbrooke Sherbrooke, QC, Canada
4:40-5:00pm: Short Talk 3
• “Discovering Ovarian Cancer Biomarkers using Gene Ontology Based Microarray Analysis”, Wei Guan, Alexander Gray, Sham Navathe, Nathan Bowen, John McDonald, and Lilya Matyunina, Georgia Institute of Technology

5:00pm: Concluding Remarks


Program Chairs

Jake Y. Chen
Indiana University School of Informatics
Purdue School of Science Department of Computer & Information Science
Indiana University–Purdue University Indianapolis
Indianapolis, IN 46202
Web site:

Stefano Lonardi
Department of Computer Science & Engineering
University of California
Riverside, CA 92521

Web site:

General Chair

Mohammed Zaki
Department of Computer Science
Rensselaer Polytechnic Institute
Troy, NY 12180-3590

Web site:

Program Committee

Amandeep Sidhu Curtin University, Australia
Eamonn Keogh University of California, Riverside
Daisuke Kihara Purdue University
Giuseppe Lancia University of Udine, Italy
Guojun Li ShanDong University, China
Haixu Tang Indiana University
Huanmei Wu IUPUI
Isidore Rigoutsos  IBM T. J. Watson Research Center
Jason Wang New Jersey Institute of Technology
Jie Zheng NCBI, USA
Jignesh M. Patel  University of Michigan
Knut Reinert Freie U niversitt Berlin, Germany
Li Liao University of Delaware
Luke Huan University of Kansas
Fenglou Mao University of Georgia
Muhammad Abulaish Jamia Millia Islamia, India
Natasa Przulj University of California Irvine
Pan Du Northwestern University
Phoebe Chen Deakin University, Australia
Rahul Singh San Francisco State University
Richard Scheuermann University of Texas Southwestern
Simon Lin Northwestern University
Xiang Zhang Purdue University
Teresa Przytycka NCBI/NLM, USA
Tony Hu Drexel University
Xiaoyan Zhu Tsinghua University, China
Yi Pan Georgia State University
Yu-Ping Wang University of Missouri



