CSCI4969-6969 Syllabus

Introduction

This course focuses on machine learning algorithms for analyzing biological data. The course will introduce the main topics in this area, such as analysis of protein/DNA sequences, protein structures, molecular graphs, and so on. The main focus is on the role of deep learning and data mining in computational biology and bioinformatics.

Learning Objectives

After taking this course students will be

  • knowledgeable about the fundamental bioinformatics tasks like sequence and structure analysis and evolution, biological networks, and machine learning methods in bioinformatics

  • able to understand the key algorithms for the main tasks

  • able to implement and apply the techniques to real world datasets

Prerequisites

Prior knowledge of biology is not required. Prior exposure to linear algebra, and probability and statistics is a plus. You will be expected to read cutting-edge machine learning papers on the course related topics.

Assignments will require the use of Python, using Numpy for numeric computing and PyTorch (and related libraries) for deep learning. You'll be given accounts on the CCI cluster at RPI that has state-of-the-art GPU nodes for deep learning.

Textbook

There is no required text for the course. Reading materials will be posted online. For the deep learning content, the following book is good:
Dive Into Deep Learning, A. Zhang, Z.C. Lipton, M. Li, A. Smola.

Grading Policy

Your grade will be a combination of the following items.

  • Assignments and HWs (70%): Assignments and HWs will be given throughout the semester. These will include an implementation component and can also have written questions. Most assignments will be due just before midnight on the due date. Late assignments will be accepted with 15% grade penalty per day (for at most 3 days).

  • Final Project (20%): You will implement cutting edge approaches from the literature and extend/improve them. You will read papers on the topic, implement and compare state-of-the-art methods, and write a report on your findings. Finally, you will present your projects in class. Final projects can be done in groups of at most two students.

  • Attendance (10%): Students are required to attend and participate in the class. Attendance will be taken.

Students auditing the course are required to complete all assignments and projects, as well as attend the classes.

Academic Integrity

Students must work independently on all course assignments and projects. You may consult other members of the class on the assignments, but you must submit your own work. For instance you may discuss general approaches to solving a problem, but you must implement the solution on your own (similarity detection software may be used). Anytime you borrow material from the web or elsewhere, you must acknowledge the source. Copying and pasting from published sources or the internet is considered plagiarism and is not acceptable.

Student-teacher relationships are built on trust. Acts which violate this trust undermine the educational process. The Rensselaer Handbook of Student Rights and Responsibilities and The Rensselaer Graduate Student Supplement define various forms of Academic Dishonesty and procedures for responding to them. Submission of any assignment that is in violation with these policies will result in a penalty that is deemed by the instructor to be appropriate to the infraction ranging from a grade of zero on the assignment in question, to failure of the class as a whole. The student will also be reported to the Dean of Students or the Dean of Graduate Education as appropriate. If you have any questions concerning this policy before submitting an assignment, please ask for clarification.