CSCI 4961/6961 Machine Learning and Optimization, Fall 2020

Overview

Modern (i.e. large-scale, or “big data”) machine learning and data science typically proceed by formulating the desired outcome as the solution to an optimization problem, then applying randomized algorithms to solve these problems efficiently. This class introduces the probability and optimization background necessary to understand these randomized algorithms, and surveys several popular randomized algorithms, placing the emphasis on those widely used in ML applications. The homeworks will involve hands-on applications and empirical characterizations of the behavior of these algorithms.

Course Logistics

The syllabus is available as an archival pdf, and is more authoritative than this website.

Instructor: Alex Gittens (gittea at rpi dot edu)

Lectures (via Zoom): MTh 10am-12pm (email me if you have not yet received the Zoom invitation--- this may happen if you enrolled close to the start of the semester)

Questions and Discussions: Piazza

Office Hours (use Submitty Instructor OH queue with code mloptoh): MTh 12pm-1pm ET, or by appointment

TA: Owen Xie (xieo at rpi dot edut)

TA Office Hours (use Submitty TA OH queue with code mloptoh): T 9-10am ET, Th 4-5pm ET

Course Text: None

Grading Criteria:

Letter grades will be computed from the semester average. Lower-bound cutoffs for A, B, C and D grades are 90%, 80%, 70%, and 60%, respectively. These bounds may be moved lower at the instructor's discretion.

Lecture Schedule

The recorded lectures are made available on the course's MediaSite channel shortly after lectures: https://mediasite.mms.rpi.edu/Mediasite5/Channel/mlandopt.

  1. Monday, August 31. Lecture 1. Course logistics, what is ML, k-nearest neighbors, SVMs. lecture notes.
  2. Thursday, September 3. Lecture 2. Finish SVMs, gradient vs stochastic first-order methods, probability distributions, discrete and continuous random variables. Lecture notes.
  3. Tuesday, September 8. Lecture 3. Mean and variance, random vectors, marginals, covariances, covariance matrix, multivariate Gaussians. Lecture notes.
  4. Thursday, September 10. Lecture 4. Independence and conditioning, Law of Large Numbers, Bayes' rule, Law of Total Probability, Law of Total Expectation, Tower Property, Von Neumann's algorithm for using a biased coin to simulate a fair one. Lecture notes.
  5. Monday, September 14. Lecture 5. Runtime of Von Neumann's algorithm, transformations of random variables, properties of multivariate gaussians, empirical risk minimization. Lecture notes.
  6. Thursday, September 17. Lecture 6. Regression losses; conditional mean as Bayes optimal estimator for squared loss; decomposition of risk of empirical risk minimizer into approximation error, generalization gap and optimization error; maximum likelihood estimation and logistic regression. Lecture notes.
  7. Monday, September 21. Lecture 7. Multinomial Logistic regression: logits, softmax, logsumexp. Regularization: overfitting, l2 and l1 regularization, regularized empirical risk minimization. Lecture notes. Scikit-learn code/example of overfitting and regularization.
  8. Thursday, September 24. Lecture 8. Confusion matrices, convex sets, convex functions, convex optimization, consequences of convexity for optimization, Jensen's inequality, first- and second-order characterizations of convexity, example: logsumexp is convex. Lecture notes.
  9. Monday, September 28. Lecture 9. Jensen's inequality (proof); examples of convex functions; operations that preserve function convexity; examples of convex sets; examples of convex optimization problems. Lecture notes.
  10. Thursday, October 1. Lecture 10. Project expectations and guidelines; optimality conditions for smooth and nonsmooth, constrained and unconstrained convex optimization; Cauchy-Schwarz and angles between vectors; gradient descent and projected gradient descent; subdifferentials and subgradients; examples of subdifferentials: the positive part and the absolute value function. Lecture notes.
  11. Monday, October 5. Lecture 11. Subdifferential of l1 norm; calculation rules for subdifferentials; subdifferential of l2 norm; subdifferential of l1-regularized SVM; computational example: error-correcting output coding vclassification using gradient descent. Lecture notes. Example of gradient descent for ECOCC on Fashion-MNIST.
  12. Thursday, October 8. Lecture 12. Stochastic gradient descent; stochastic subgradient descent; minibatches and epochs. Lecture notes (Updated since original posting).
  13. Thursday, October 15. Lecture 13. Backtracking line search, method of steepest descent, Newton's method. Lecture notes.
  14. Monday, October 19. Lecture 14. Adaptive gradient descent methods: a motivating example where we benefit from feature-specific learning rates, SGD with momentum, Nesterov's Accelerated Gradient Descent, AdaGrad. Lecture notes. Example code comparing accelerated methods.

Homeworks and Weekly Participation

Homework and Weekly participation submission link: pdf and python code only, 1MB limit

Late assignments will not be accepted, unless you contact the instructor at least two days before the due date to receive a deferral. Deferrals will be granted at the instructor’s discretion, of course.

Project

In teams of up to five, you will present either an original research project or an exposition on a topic relevant to the course. See the project page for more details and deadlines. Your group assignments will be posted to Piazza.

Supplementary Materials

For your background reading, if you are unfamiliar with the linear algebra and probability being used: If you want further reading on convexity and convex optimization: As a reference for Keras, the machine learning framework we will use in the deep learning portion of the course: