CSCI 6971/4971 Large Scale Matrix Computations and Machine Learning, Spring 2019

Randomized vs non-randomized Gauss-Siedel for solving a 250K-by-250K kernel regression problem.


Modern machine learning routinely deals with millions of points in high-dimensional spaces. Classical linear algebra and optimization algorithms can be prohibitively costly in such applications, as they aim at machine precision and/or scale super-linearly in the size of the input data. Randomization can be used to bring the costs of machine learning algorithms closer to linear in the size of the input data; this is done by sacrificing, in a principled manner, computational accuracy for increased speed. This course surveys modern randomized algorithms and their applications to machine learning, with the goal of providing a solid foundation for the use of randomization in large-scale machine learning.

Topics covered will include time-accuracy tradeoffs, stochastic first-order and second-order methods, applications of low-rank approximation, approximate kernel learning, distributed optimization, and hyperparameter optimization.

Course Information

The syllabus is available as an archival pdf, and is more authoritative than this website.

Course Text: Lecture notes (the ones you scribe for yourself).

Grading Criteria:

Students are expected to have writing supplies on hand in each class to complete the in-class pop quizzes. If you are an athlete or for some other reason are not able to attend each class, make alternative arrangements with the instructor in the first two weeks of the course.

Letter grades will be computed from the semester average. Maximum lower bound cutoffs for A, B, C and D grades are 90%, 80%, 70%, and 60%, respectively. These bounds may be moved lower at the instructor's discretion.

Topics and Schedule

Tentative topics (subject to change, and in no particular order): block coordinate descent, projected stochastic (quasi-)newton methods, solvers for l1-regularized problems, large-scale latent factor models, large-scale ranking, extreme multi-class and multi-label classification, low-rank matrix factorization and completion algorithms, ADMM, large-scale kernel machines, SVRG methods, distributed and asynchronous SGD, large-scale gaussian processes, hyperparameter optimization, online learning.


All assignments must be typed (preferably in LaTeX) and are due at the start of class (defined as the first 15 minutes) via email. Late assignments will be penalized and accepted at the instructor's discretion.


As described in the syllabus, each student will present a paper to the class; this entails

CSCI6971 students: read the CS graduate seminar skills presentation on reading papers, and in your presentation, directly address the questions it suggests asking yourself as you are reading the paper.

You may choose any paper related to the material covered in class, and this choice must be approved. Here are some possibilities (if you choose one, let me know so I can remove it):

Paper selections are due March 28, your completed git repo is due April 22, and the presentations will be held the week of April 22. In the interim period, you must attend at least two office hours to discuss this paper with me:

Your efforts during those discussions, as well as your final presentation, will determine your project grade.

Supplementary Materials