Modern (i.e. large-scale, or “big data”) machine learning and data science typically proceed by formulating the desired outcome as the solution to an optimization problem, then using suitable algorithms to solve these problems efficiently.
The first portion of this course introduces the probability and optimization background necessary to understand the randomized algorithms that dominate applications of ML and large-scale optimization, and surveys several popular randomized and deterministic optimization algorithms, placing the emphasis on those widely used in ML applications.
The second portion of the course introduces architectures used in modern machine learning because of the proven effectiveness of their inductive biases, and presents common regularization techniques used to mitigate the issues that arise in solving the nonlinear optimization problems ubiquitous within modern machine learning.
The homeworks involve hands-on applications and empirical characterizations of the behavior of these algorithms and model architectures. A project gives the students experience in critically reading the research literature and crafting articulate technical presentations.
The syllabus is available as an archival pdf, and is more authoritative than this website.
Instructor: Alex Gittens (gittea at rpi dot edu)
Lectures: MTh 10am-12pm ET (in person, Sage 3101)
Questions and Discussions: Campuswire
Office Hours: MTh 12pm-1pm ET in the #office-hours live chat on Campuswire, or by appointment
TA: Ian Bogle (boglei at rpi dot edu)
TA Office Hours: TWed 1-2pm ET
Course Text: None
- Homeworks, 50%
- Weekly Participation, 15%
- Project, 35%
Letter grades will be computed from the semester average. Lower-bound cutoffs for A, B, C and D grades are 90%, 80%, 70%, and 60%, respectively. These bounds may be moved lower at the instructor's discretion.
- Monday, August 30. Lecture 1. Course logistics; what is ML; examples of ML models: k-nn and svms. lecture notes
- Thursday, September 2. Lecture 2. Probability theory for modeling and analysis in ML, as algorithmic tool in optimization. Ordinary least squares. Population and empirical risk minimization. Basic probability: sample spaces, pmfs/pdfs, probability measures, random variables, random vectors, expectations, examples, joint pmfs/pdfs. lecture notes
- Tuesday, September 6. Lecture 3. Joint random variables, marginals, conditional distributions, independence, conditional independence, Naive Bayes assumption for classification. lecture notes
- Thursday, September 8. Point Estimates: expectation and variance. Independence, expectation, and variance. Law of Large Numbers. Conditional expectation. Conditional expectation as a point estimate for least squares regression; the regression function. lecture notes
- Monday, September 13. Parameterized ML models. Generalized linear models: Poisson regression, Bernoulli regression (logistic regression), Categorical regression (multiclass logistic regression). Maximum likelihood estimation via minimization of negative log-likelihood. MLE for Gaussian model is ordinary least squares. lecture notes
- Thursday, September 16. Binary and multiclass logistic regression from a different viewpoint: geometric interpretations and linear separability, MLE for both leads to ERM, the logistic loss function, softmax, and logsumexp. lecture notes
Homeworks and Weekly Participation
- CSCI 6961 Homework/Participation submission link: pdf and python code only, 1MB limit
- CSCI 4961 Homework/Participation submission link: pdf and python code only, 1MB limit
Late assignments will not be accepted, unless you contact the instructor at least two days before the due date to receive a deferral. Deferrals will be granted at the instructor’s discretion, of course.
- Weekly Participation 1. Posted 9/9/2021, due 9/13/2021.
- Weekly Participation 2. Posted 9/13/2021, due 9/20/2021.
- Homework 1. Posted 9/13/2020, due 9/23/2020.
In teams of up to five, you will present either an original research project or an exposition on a topic relevant to the course. See the project page for more details and deadlines. Your group assignments will be posted to Campuswire.
Supplementary MaterialsFor your background reading, if you are unfamiliar with the linear algebra and probability being used:
- Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares. Boyd and Vandenberghe.
- Jeff Erickson's notes on discrete probability. Erickson.
- Introduction to Probability, Statistics, and Random Processes. Pishro-Nik.
- Chapter 3 of "Deep Learning". Goodfellow, Bengio, and Courville.
- Chapter 1 of "Bayesian Reasoning and Machine Learning". Barber.
- Convexity and Optimization. Lecture notes by R. Tibshirani.
- Optimization for Machine Learning. Lecture notes by E. Hazan.
- Optimization Methods for Large-scale Machine Learning. SIAM Review article. Bottou, Curtis, and Nocedal.
- Theory of Convex Optimization for Machine Learning. Bubeck
- Convex Optimization. Boyd and Vandenberghe.