Overview

Schedule

Invited Speakers

Papers

Organizers

10th International Workshop on Data Mining in Bioinformatics (BIOKDD 2011)

Bioinformatics is the science of managing, mining, and interpreting information from biological data. Various genome projects have contributed to an exponential growth in DNA and protein sequence databases. Advances in high-throughput technology such as microarrays and mass spectrometry have further created the fields of functional genomics and proteomics, in which one can monitor quantitatively the presence of multiple genes, proteins, metabolites, and compounds in a given biological state. The ongoing influx of these data, the presence of biological answers to data observed despite noise, and the gap between data collection and knowledge curation have collectively created exciting opportunities for data mining researchers.

While tremendous progress has been made over the years, many of the fundamental problems in bioinformatics, such as protein structure prediction, gene-environment interaction, and regulatory pathway mapping, are still open. Beside these, new technologies such as next-generation sequencing are producing massive amount of sequence data; managing, mining and compressing these data raise challenging issues. Data mining will play an essential role in understanding these fundamental problems and development of novel therapeutic/diagnostic solutions in post-genome medicine.

The goal of this workshop is to encourage KDD researchers to take on the numerous challenges that Bioinformatics offers. This year, the workshop will feature the theme of Data Mining Challenges in Next-generation Sequencing (NGS). NGS is revolutionizing biological, biomedical, and health research. There are enormous data analyses and knowledge discovery challenges in the NGS technology, including expression analysis, mutational analysis, alternative slicing pattern discovery, whole transcription sequence alignment, epigenetics site discovery, storing and compression of high volume sequence data and clustering and classification of structural variations in a population.


Schedule Return to Top


Workshop Schedule at a Glance
August 21, 2011 Sunday
8:25-9:25 Opening Remarks
Invited Speaker presentation 1
9:30-10:10
10:10-10:30 Coffee break
10:30-11:25 Invited Speaker presentation 2
11:30-12:45
Closing Remarks

Invited Speakers Return to Top


  • Dr. Vineet Bafna, Professor, University of California, San Diego
  • Dr. Harry Gao, Director, DNA Sequencing/Solexa Core Lab, City of Hope


Table of Contents Return to Top


Algorithm for Low-Variance Biclusters to Identify Coregulation Modules in Sequencing Datasets
Zhen Hu (University of Cincinnati)
Raj Bhatnagar (University of Cincinnati)

Analysis of Obligate and Non-obligate Complexes using Desolvation Energies in Domain-domain Interactions
Mina Maleki (University of Windsor)
Md. Mominul Aziz (University of Windsor)
Luis Rueda (University of Windsor)

Analysis of Obligate and Non-obligate Complexes using Desolvation Energies in Domain-domain Interactions
K.S.M. Tozammel Hossain (Virginia Tech)
Chris Bailey-Kellogg (Dartmouth College)
Alan Friedman (Purdue University)
Michael Bradley (Yale University)
Nathan Baker (Pacific Northwest National Laboratory)
Naren Ramakrishnan (Virginia Tech)

Analyze Influenza Virus Sequences Using Binary Encoding Approach
Hamching Lam (University of Minnesota)
Daniel Boley (University of Minnesota)

A Lung Cancer Outcome Calculator Using Ensemble Data Mining on SEER Data
Ankit Agrawal (Northwestern University)
Sanchit Misra (Northwestern University)
Ramanathan Narayanan (Northwestern University)
Lalith Polepeddi (Northwestern University)
Alok Choudhary (Northwestern University)


Organizers Return to Top


  • Mohammad Al Hasan, Indiana University--Purdue University, Indianapolis
  • Jun (Luke) Huan, University of Kansas
  • Jake Y Chen, Indiana University--Purdue University, Indianapolis
  • Mohammed J Zaki, Rensselaer Polytechnic Institute