Protein Structure Prediction

Edited by Mohammed J. Zaki and Chris Bystroff

Springer, 2008, ISBN: 978-1-58829-752-5


For forty years we have known the essential ingredients for protein folding an amino acid sequence, and water. But the problem of predicting the three-dimensional structure from its sequence has eluded computational biologists even in the age of supercomputers and high throughput structural genomics. Despite the unsolved mystery of how a protein folds, advances are being made in predicting the interactions of proteins with other molecules, such as small ligands, nucleic acids or other proteins. Protein Structure Prediction focuses on the various computational methods for prediction, their successes and their limitations, from the perspective of their most well-known practitioners. Leaders in the field provide insights into template-based methods of prediction, structure alignment and indexing, protein features prediction, and methods for de novo structure prediction. Protein Structure Prediction is a cutting-edge text that all researchers in the field should have in their libraries.


  • Describes cutting-edge approaches for protein structure prediction
  • Provides a comprehensive view of structure prediction methods and their assessment, including template-based approaches, structure alignment and indexing, feature prediction, and de novo methods
  • Chapters written by the most well-known practitioners in their areas

Table of Contents

Overview of Protein Structure Prediction
  • 1. A historical perspective of template-based protein structure prediction, Jun-tao Guo, Kyle Ellrott, and Ying Xu
  • 2. The assessment of methods for protein structure prediction, Anna Tramontano, Domenico Cozzetto, Alejandro Giorgetti, and Domenico Raimondo
Template-based Methods
  • 3. Aligning Sequences to Structures, Liam J. McGuffin
  • 4. Protein Structure Prediction Using Threading, Jinbo Xu, Feng Jiao, and Libo Yu
Structure Alignment and Indexing
  • 5. Algorithms for Multiple Protein Structure Alignment and Structure-Derived Multiple Sequence Alignment, Maxim Shatsky, Ruth Nussinov, and Haim J. Wolfson
  • 6. Indexing Protein Structures using Suffix Trees, Feng Gao and Mohammed J. Zaki
Protein Features Prediction
  • 7. Hidden Markov Models for Prediction of Protein Features, Christopher Bystroff and Anders Krogh
  • 8. The pros and cons of predicting protein contact maps, Lisa Bartoli, Emidio Capriotti, Piero Fariselli, Pier Luigi Martelli, and Rita Casadio
  • 9. Roadmap Methods for Protein Folding, Mark Moll, David Schwarz, Lydia E. Kavraki
Methods for de novo Structure Prediction
  • 10. Scoring functions for de novo protein structure prediction revisited, Shing-Chung Ngan, Ling-Hong Hung, Tianyun Liu, and Ram Samudrala
  • 11. Protein-Protein Docking: Overview and Performance Analysis, Kevin Wiehe, Matthew W. Peterson, Brian Pierce, Julian Mintseris, and Zhiping Weng
  • 12. Molecular Dynamics Simulations of Protein Folding, Angel E. Garcia

Data Mining in Bioinformatics

Edited by Jason T. L. Wang, Mohammed J. Zaki , Hannu T. T. Toivonen and Dennis Shasha

Springer, 2004, ISBN 1-85233-671-4


The aim of this book is to introduce you to some of the best techniques of pattern discovery in molecular biology in the hope that you will build on them to make new discoveries on your own. The techniques draw from many fields of mathematical science ranging from graph theory to information theory to statistics to computer vision. We hope you find the book as fascinating to read as we have found it to write and edit.

Table of Contents

Part I: Overview
  • Chapter 1. Introduction to Data Mining in Bioinformatics; Jason T. L. Wang, Mohammed J. Zaki, Hannu T. T. Toivonen, Dennis Shasha (New Jersey Institute of Technology, Rensselaer Polytechnic Institute, University of Helsinki, Courant Institute, New York University), 3 - 8.
  • Chapter 2. Survey of Biodata Analysis from a Data Mining Perspective; Peter Bajcsy, Jiawei Han, Lei Liu, Jiong Yang (University of Illinois at Urbana-Champaign), 9 - 39.
Part II: Sequence and Structure Alignment
  • Chapter 3. AntiClustAl: Multiple Sequence Alignment by Antipole Clustering; Cinzia Di Pietro, Alfredo Ferro, Giuseppe Pigola, Alfredo Pulvirenti, Michele Purrello, Marco Ragusa, Dennis Shasha (University of Catania, Italy, Courant Institute, New York University), 43 - 57.
  • Chapter 4. RNA Structure Comparison and Alignment; Kaizhong Zhang (University of Western Ontario, Canada), 59 - 81.
Part III: Biological Data Mining
  • Chapter 5. Piecewise Constant Modeling of Sequential Data Using Reversible Jump Markov Chain Monte Carlo; Marko Salmenkivi, Heikki Mannila (University of Helsinki, Helsinki University of Technology, Finland), 85 - 103.
  • Chapter 6. Gene Mapping by Pattern Discovery; Petteri Sevon, Hannu T. T. Toivonen, Paivi Onkamo (University of Helsinki, Finland), 105 - 126.
  • Chapter 7. Predicting Protein Folding Pathways; Mohammed J. Zaki, Vinay Nadimpally, Deb Bardhan, Chris Bystroff (Rensselaer Polytechnic Institute), 127 - 141.
  • Chapter 8. Data Mining Methods for a Systematics of Protein Subcellular Location; Kai Huang, Robert F. Murphy (Carnegie Mellon University), 143 - 187.
  • Chapter 9. Mining Chemical Compounds; Mukund Deshpande, Michihiro Kuramochi, George Karypis (University of Minnesota), 189 - 215.
Part IV: Biological Data Management
  • Chapter 10. Phyloinformatics: Toward a Phylogenetic Database; Roderic D. M. Page (University of Glasgow, United Kingdom), 219 - 241.
  • Chapter 11. Declarative and Efficient Querying on Protein Secondary Structures; Jignesh M. Patel, Donald P. Huddler, Laurie Hammel (University of Michigan), 243 - 273.
  • Chapter 12. Scalable Index Structures for Biological Data; Ambuj K. Singh (University of California at Santa Barbara), 275 - 296.

Glossary, 297 - 301. References, 303 - 326. Biographies, 327 - 336. Index, 337 - 340.

Large-Scale Parallel Data Mining

Edited By: Mohammed J. Zaki and Ching Tien "Howard" Ho

Springer, 2000, ISBN 3-540-67194-3, Series: Lecture Notes in Computer Science. LNAI State-of-the-Art Survey, Volume 1759


With the unprecedented growth-rate at which data is being collected and stored electronically today in almost all fields of human endeavor, the efficient extraction of useful information from the data available is becoming an increasing scientific challenge and a massive economic need. This book presents thoroughly reviewed and revised full versions of papers presented at a workshop on the topic held during KDD'99 in San Diego, California, USA in August 1999 complemented by several invited chapters and a detailed introductory survey in order to provide complete coverage of the relevant issues. The contributions presented cover all major tasks in data mining including parallel and distributed mining frameworks, associations, sequences, clustering, and classification. All in all, the volume presents the state of the art in the young and dynamic field of parallel and distributed data mining methods. It will be a valuable source of reference for researchers and professionals.

Table of Contents

  • Parallel and Distributed Data Mining: An Introduction, Mohammed J. Zaki
Mining Frameworks
  • The Integrated Delivery of Large-Scale Data Mining: The ACSys Data Mining Project,

Graham Williams, Irfan Altas, Sergey Bakin, Peter Christen, Markus Hegland, Alonso Marquez, Peter Milne., Rajehndra Nagappan, Stephen Roberts

  • A High Performance Implementation of the Data Space Transfer Protocol (DSTP), S. Bailey, E. Creel, R Grossman, S. Gutti, H. Sivakumar
  • Active Mining in a Distributed Setting, Srinivasan Parthasarathy, Sandhya Dwarkadas, Mitsunori Ogihara
Associations and Sequences
  • Efficient Parallel Algorithms for Mining Associations, Mahesh V. Joshi, Eui-Hong (Sam) Han, George Karypis, Vipin Kumar
  • Parallel Branch-and-Bound Graph Search for Correlated Association Rules, Shinichi Morishita, Akihiro Nakaya
  • Parallel Generalized Association Rule Mining on Large Scale PC Cluster, Takahiko Shintani, Masaru Kitsuregawa
  • Parallel Sequence Mining on Shared-Memory Machines, Mohammed J. Zaki
  • Parallel Predictor Generation, David B. Skillicorn
  • Efficient Parallel Classification Using Dimensional Aggregates, Sanjay Goil, Alok Choudhary
  • Learning Rules from Distributed Data, Lawrence O. Hall, Nitesh Chawla, Kevin W. Bowyer, W. Philip Kegelmeyer
  • Collective, Hierarchical Clustering from Distributed, Heterogeneous Data, Erik L. Johnson, Hillol Kargupta
  • A Data-Clustering Algorithm on Distributed Memory Multiprocessors, Inderjit S. Dhillon, Dharmendra S. Modha