CSCI 6968/4968 Machine Learning and Optimization, Spring 2023
Overview

Modern (i.e. large-scale, or “big data”) machine learning and data science typically proceed by formulating the desired outcome as the solution to an optimization problem, then using suitable algorithms to solve these problems efficiently.
The first portion of this course introduces the probability and optimization background necessary to understand the randomized algorithms that dominate applications of ML and large-scale optimization, and surveys several popular randomized and deterministic optimization algorithms, placing the emphasis on those widely used in ML applications.
The second portion of the course introduces architectures used in modern machine learning because of the proven effectiveness of their inductive biases, and presents common regularization techniques used to mitigate the issues that arise in solving the nonlinear optimization problems ubiquitous within modern machine learning.
The homeworks involve hands-on applications and empirical characterizations of the behavior of these algorithms and model architectures. A project gives the students experience in critically reading the research literature and crafting articulate technical presentations.
Course Logistics
The syllabus is available as an archival pdf, and is more authoritative than this website.
Instructor: Alex Gittens (gittea at rpi dot edu)
Lectures: TF 2pm-3:50pm ET in Troy 2012
Questions and Discussions: Piazza
Office Hours: Tues 4:30pm-5:30pm ET and Thurs 3pm-4pm ET in Lally 316, or by appointment
TA: Jesse Ellin (ellinj2 at rpi dot edu)
TA Office Hours: Wed 2-4pm ET in AE 118
Course Text: None
Grading Criteria:
CSCI 4968 | CSCI 6968 |
---|---|
|
|
Letter grades will be computed from the semester average. Lower-bound cutoffs for A, B, C and D grades are 90%, 80%, 70%, and 60%, respectively. These bounds may be moved lower at the instructor's discretion.
Lecture Schedule
- Lecture 1, January 10/2023. Course logistics; introduction to optimization and machine learning; support vector machines and ordinary least squares. Lecture notes.
- Lecture 2, January 13/2023. Recap of probability: pdfs, expectation, variance, independence, conditioning, marginalization; parameterized probabilty distributions. Lecture notes.
- Lecture 3, January 17/2023. Parameterized ML models: Gaussian noise model for regression, Bernoulli noise model for binary classification, Categorical noise model for multiclass classification; Maximum likelihood estimation (MLE) for fitting the first two of these models: leading to OLS, then binary logistic regression. Lecture notes.
- Lecture 4, January 20/2023. MLE for fitting maximum likelihood model; KL-divergence and cross-entropy. Training, test, and validation data set splits. Lecture notes.
- Lecture 5, January 24/2023. Maximum A Posteriori (MAP) estimation for machine learning models. Regularization (ℓ1/2) and Regularized Empirical Risk Minimization. Risk Decomposition: approximation, generalization, and optimization errors. Lecture notes.
- Lecture 6, January 27/2023. Revuew of Taylor Series. Oracle Models of Optimization. Convex sets, functions, and optimization problems. Strict convexity. Uniqueness of minimizers of strictly convex functions. Lecture notes.
- Lecture 7, January 31/2023. First and second order characterizations of convex functions. Positive semdefinite matrices. Examples of convex functions. Operations that preserve convexity. Examples of convex optimization problems. Jensen's inequality. Lecture notes.
- Lecture 8, February 3/2023. Projection onto a convex set. The Cauchy-Schwarz inequality and geometric interpretation. Optimality conditions for smooth convex optimization problem, unconstrained and constrained. Lecture notes.
Homeworks and Weekly Participation
Homework/Participation submission link: pdf and python code only, 1MB limitLate assignments will not be accepted, unless you contact the instructor at least two days before the due date to receive a deferral. Deferrals will be granted at the instructor’s discretion, of course.
- Weekly Participation 1.
- Weekly Participation 2.
- Weekly Participation 3.
- Weekly Participation 4. To be posted.
- Weekly Participation 5. To be posted.
- Weekly Participation 6. To be posted.
- Weekly Participation 7. To be posted.
- Weekly Participation 8. To be posted.
- Weekly Participation 9. To be posted.
- Homework 1
- Homework 2. To be posted.
- Homework 3. To be posted.
- Homework 4. To be posted.
- Homework 5. To be posted.
- Homework 6. To be posted.
Project
In teams of up to five, you will present either an original research project or an exposition on a topic relevant to the course. See the project page for more details and deadlines. Your group assignments will be posted to Piazza.
Supplementary Materials
For your background reading, if you are unfamiliar with the linear algebra and probability being used:- Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares. Boyd and Vandenberghe.
- Jeff Erickson's notes on discrete probability. Erickson.
- Introduction to Probability, Statistics, and Random Processes. Pishro-Nik.
- Chapter 3 of "Deep Learning". Goodfellow, Bengio, and Courville.
- Chapter 1 of "Bayesian Reasoning and Machine Learning". Barber.
- Convexity and Optimization. Lecture notes by R. Tibshirani.
- Optimization for Machine Learning. Lecture notes by E. Hazan.
- Optimization Methods for Large-scale Machine Learning. SIAM Review article. Bottou, Curtis, and Nocedal.
- Theory of Convex Optimization for Machine Learning. Bubeck
- Convex Optimization. Boyd and Vandenberghe.