CSCI4969-6969 Machine Learning in Bioinformatics

This course focuses on machine learning algorithms for analyzing biological data. The course will introduce the main topics in this area, such as analysis of genome sequences, protein structures, gene networks, and so on. We will cover some of the traditional algorithms for these tasks, but the main focus is on the role of deep learning and data mining in computational biology and bioinformatics.

Class Hours: 10-11:50AM TF, Low 3039. Office Hours: 12-1PM TF

Syllabus: CSCI4969-6969 Syllabus

Campuswire: https://campuswire.com/c/GD5DDE12E/

Submitty: https://submitty.cs.rpi.edu/s20/csci4969

Class schedule

Date

Topic

Readings

Lecture Notes

Jan 14

Introduction I

R1

Intro

Jan 17

Introduction II

R1,P1

lecture1

Jan 21

Linear and Logistic Regression

R2,R3,P2

lecture2

Jan 24

Word Embeddings

P2

lecture3

Jan 28

Word2Vec

R4

lecture4

Jan 31

Neural Networks I

R4

lecture5

Feb 4

MLPs

R4

lecture6

Feb 7

RNNs and LSTMs

R5

lecture7

Feb 11

Seq2Seq Models

lecture8

Feb 14

Transformer and Attention

P4,P5

lecture9

Feb 18

NO CLASS (Mon Schedule)

Feb 21

BERT

P3,P5

lecture10

Feb 25

CNNs

R5

lecture11

Feb 28

Seondary Structure Prediction

P6

lecture12

Mar 3

Secondary Structure Prediction

P6

lecture13

Mar 6

Embeddings with Structure

P7

lecture14

Mar 10-Mar 20

NO CLASS (Spring Break)

Mar 24

Embeddings with Structure

P8

lecture15, video-mar24

Mar 27

3D Structure Prediction

P8

lecture16, video-mar27

Mar 31

3D Structure Prediction

P9

lecture17, video-mar31

Apr 3

3D Structure Prediction

P9

lecture18, video-apr3

Apr 7

Structure Prediction Implementation

P9

lecture19, video-apr7

Apr 10

Structure Prediction

P10, R6

lecture20, video-apr10

Apr 14

Distance Geometry

R6

lecture21, video-apr14

Apr 17

Molecular Graphs

P11, P12

lecture22, video-apr17

Apr 21

Molecular Graphs

P11, P12

lecture23, video-apr21 (apologies that only audio was recorded)

Apr 24

Graph NNs, Protein Interface Prediction

P13, P14

lecture24, video-apr24

Apr 28

Graph NNs, Protein Interface Prediction

P15

lecture25, video-apr28

Papers

See https://github.com/hussius/deeplearning-biology for a list of papers on deep learning in computational biology.

The papers we will read appear below. They are referred to as Px in the course schedule above.

  1. P1: Deep learning: new computational modelling techniques for genomics, https://www-nature-com.libproxy.rpi.edu/articles/s41576-019-0122-6

  2. P2: Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics, https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0141287

  3. P3: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences, http://biorxiv.org/lookup/doi/10.1101/622803

  4. P4: Attention is all you need, https://arxiv.org/abs/1706.03762

  5. P5: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, https://arxiv.org/abs/1810.04805

  6. P6: DeepPrime2Sec: Deep Learning for Protein Secondary Structure Prediction from the Primary Sequences, https://www.biorxiv.org/content/10.1101/705426v1

  7. P7: Learning protein sequence embeddings using information from structure, https://arxiv.org/abs/1902.08661

  8. P8: End-to-end differentiable learning of protein structure, https://www.biorxiv.org/content/10.1101/265231v2

  9. P9: Improved protein structure prediction using potentials from deep learning, https://www.nature.com/articles/s41586-019-1923-7, https://deepmind.com/research/open-source/alphafold_casp13

  10. P10: Improved protein structure prediction using predicted interresidue orientations, https://www.pnas.org/content/117/3/1496

  11. P11: A Deep Learning Approach to Antibiotic Discovery, https://doi.org/10.1016/j.cell.2020.01.021

  12. P12: Analyzing Learned Molecular Representations for Property Prediction, https://doi.org/10.1021/acs.jcim.9b00237

  13. P13. End-to-End Learning on 3D Protein Structure for Interface Prediction, https://arxiv.org/abs/1807.01297

  14. P14. Protein Interface Prediction using Graph Convolutional Networks, https://papers.nips.cc/paper/7231-protein-interface-prediction-using-graph-convolutional-networks.pdf

  15. P15. Relational inductive biases, deep learning, and graph networks, https://arxiv.org/abs/1806.01261

Readings

These readings are referenced as Rx in the course schedule above.