# CSCI 4961/6961 Project Details, Fall 2020

The second major component of the course (primary being the homework), is the course project. This project must relate to the content of the course: it should involve some use of randomized algorithms for machine learning or optimization. Due to the size of the class, the projects will be done in groups of up to five; groups will consist entirely of undergrads or of grads. The group assignments will be posted to Piazza.

Groups will choose to do one of two types of projects: *research* projects, or *pedagogical* projects. In the former you will do original research related to the content of the course, either theoretical or applied; this research can be work you have already been conducting, or could be new. This research will be presented in a 20 minute presentation.

For pedagogical projects, you will develop a 20 minute presentation covering a topic relating to the course; part of this presentation must consist of a freshly designed empirical evaluation of the method or result under consideration.The presentation should be accompanied by a problem set, with solutions, similar in difficulty to the ones assigned in class to test the understanding of students after watching the presentation. You can choose to survey algorithms in particular subfields of ML or optimization that we do not cover in class, or focus on a single paper in detail.

## Potential project ideas

Here are a few sample ideas to get your thoughts flowing:- Select a recent paper from NeurIPS, ICML, ICLR
- the kmeans++ algorithm
- the Blendenpik algorithm for solving linear systems
- the Goemans–Williamson randomized approximation to MAX-CUT
- randomized triangle counting in graphs
- a topic in reinforcement learning
- locality sensitive hashing (e.g. SimHash)
- the randomized approximate Caratheodory theorem
- NewtonSketch for convex optimization
- randomized element-wise sparsification of a matrix
- compressed sensing (using the fact that random matrices satisfy the restricted isometry principle)
- approximate Bayesian computation
- Kalman filtering applications in ML
- Hidden Markov Chain applications in ML

## Grading Rubric and Deadlines

Task | Due dates (ET) | Percentage of grade | Details |
---|---|---|---|

Project selection | 11:59pm, October 12 | 10 | via email |

Project progress report | 5pm, November 23 | 40 | via WebEx |

Deliverables | 11:59pm December 2 | 30 | via public Github repos and Box. |

Peer feedback | 11:59pm December 9 | 20 | via Submitty |

See the "What is Research?", "Giving Talks", and "Reading Papers" slide decks from the 2019 CS Grad Skills Seminar to understand the expectations I will use to evaluate your projects.

## Project selection

Project selection will be via email communication: one person should communicate with me from each group. Groups must get *approval* to present the research or pedagogical topic that they have chosen. This means I need to be informed of your decision *and* have verified that it is an appropriate choice by the deadline. You must provide me with enough information to make this choice: for pedagogical topics, what specific paper(s) will you cover, and why does it relate to the class? Similarly, for research topics, what is the problem you will tackle, what techniques will you attempt to use, and why does it relate to the class?

## Project Progress Report

*All of your group* must discuss your research/pedagogy in a group meeting with me by the deadline. Address all the points I raise below, to show that you will be able to give a quality presentation. Justify your choice of empirical evaluations and baselines. I *highly* suggest you have your experiments done so that we can discuss them.

Schedule this meeting at least the week before the deadline: I will not be able to meet with every group on the day of the deadline. If you have difficulties with your research or pedagogy, meet with me to resolve them well before your scheduled formal discussion.

See my suggestions on reading papers from the 2019 CS Grad Skills Seminar.

## Deliverables

*Code* should be submitted in a single github/gitlab repo for each project. The experimental results presented in your talk must be easily reproducible given access to this repo.

- Well-documented cross-platform code for reproducing your experimental evaluations. Julia, Python, R, and C++/C are acceptable.
- Either include the data sets you used (if small enough), or provide a script that downloads and preprocesses them to the format that your code expects as input
- A pdf slide deck for your 20 minute in-class presentation, using appropriately
*typeset math and legible figures*that addresses all of the points below. - If your project is pedagogical, post the problem set here as well, clearly labeled.

*Presentations* should be uploaded to a group member's RPI Box account, and shared with me. I will then upload all the presentations to the course MyMediaSite. See Prof. Anshelevich's suggestions on giving a good talk from the 2019 CS Grad Skills Seminar.
Address the following points in your presentation to receive full credit for this portion. (This is for pedagogical projects, so adjust appropriately for research projects)

- Who are the authors, and the date and venue of publication?
- What is the problem that is addressed (pick one, if the paper addresses more than one), and why is it interesting or useful?
- What is the main result of the paper?
- Describe the result or algorithm and motivate it intuitively.
- What is the cost (time, space, or some other metric) of this algorithm, and how does it compare to prior algorithms for the same problem? (and similarly, for non-algorithmic results)
- What performance guarantees, if any, are provided for the algorithm?
- Give an accurate description of the analysis given in the paper: in simple cases this may be a tour through the entire argument; when this is not possible, focus on explaining a core lemma/theorem that supports the claim of the paper.
- Provide an empirical evaluation of the algorithm: compare its performance to reasonable baselines, and explore relevant aspects of the algorithm (its variability, sensitivity to relevant properties of the input, etc.). If presenting a non-algorithmic result and it is possible, provide some experimental evidence of its sharpness or lack thereof.

## Peer Feedback

Each student will be assigned two projects. They will be responsible for watching these project presentations and reviewing the code and problem sets, if any, then writing a short report summarizing their impressions of these three aspects of the projects. The grade will be based on the extent to which it is clear you engaged critically with the material.