24 Feb 2020

================================================================================
* Homework 

Problem 1: 
- Simpler than you think
- Nobody recovers or dies 'during' the simulation
  * Timeframe considerably shorter than it takes to recover/die
  * Data from actual conference interactions
- 100 iterations of 'full simulation'
  * Full = all edges are simulation
  * Take averages over 100 full simulations
  
For Problem 2:
- No updates to the problem specification

================================================================================
* Project

First presentation on Match 5th

Expectations for the presentation:
- 5-10 minutes presentation on:
  * Project proposal 'idea' - but should be relative thought out
  * Specific problem being addresses
  * Specific dataset for use or testing
  * Specific algorithms to implement/other code needing written
  * How testing is done - how will you know if you succeeded?
  * Anything else done so far
Submit via submitty before class
  * PDFs much preferred for presentations (rolling dice with LibreOffice)
  * But code/text files/data, etc. can also be included
    + Still include some text giving an overview addressing the above points
    + So I can look back at it for later presentations/progress/etc.
Only hard requirement:
  * Has to involve graphs (interaction data, algorithms on graphs, etc.)
  * Grad students: 
    + Can involve ongoing research (but still need to make progress)
    + Groups can involve both grad+undergrad
    + |Group| = 1 is fine

================================================================================
* Link prediction via Matrix factorization

We looked at solving an optimization problem:

min_{U,V} sum_{nonzeros} 
(A_ij - u_i*V*u_j')^2 + w_0*sum_{zeros} (u_i*V*u_j')^2 + β_1*|U|^2 + B_2*|V|^2

We attempted to solve A ~= UVU'
- Intuitively: U,V contain 'latent features' that describe interaction in 
  our data
- Via gradient descent
- To summarize:
  * Can be implemented relatively easily (~10 lines)
  * Can be solved relatively quickly (~couple minutes)
  * Solution quality was so-so 
      (better than random, not as good as 'social techniques')
  * Tough to generalize our solution, tough to add in new data
  
================================================================================
* Recommender Systems

Definition: 'systems' that attempts to predict or 'recommend' user preferences
- Link prediction is one problem that R.S. try to solve
  * How we differentiate the problem here: trying to not just predict a link,
    but the explicit strength or rating that a user might give to that link
- Other examples: 
  * Netflix recommending movies to you
  * Amazon recommending products
  * Advertising in general - how you'll feel about a product shown to you

================================================================================
* Collaborative Filtering

One common approach for recommender systems.

Definition: Making a selection from a wide set of possibilities (filtering) 
  for a user given known preferences of that user or those similar to that
  user or what known preferences are in general (collaborative)

Like in Homework 2:
- Filtering what links (products) we're predicting for a user
- Based off of the purchases of 'similar' users
  * We used Jaccard index+other stuff to define similarity

General approaches for collaborative filtering:
- The approach from HW2: defining explicit similarities between users
- Current method: Matrix Factorization (latent features define similarity)
- Machine learning approaches: Neural Networks, Random Forests, etc.
  * Might need to explicitly realize these features
  * Can captures the non-linear behavior that Matrix Factorization might miss
  
================================================================================
* Netflix challenge

~2007 or so:
- Netflix: we have a predictor for user-movie preferences (CineMatach)
- Predicts ratings (1-5) that a specific user would give a specific movie

Challenge: 
- Improve accuracy of CineMatch by 10%
- Released hundreds of millions of user-movie ratings
- If you do, we'll give you $1,000,000
- Took a few years for a team to eventually get there
  * Used aggregation of ML techniques 
  * MF by itself got a full 7% improvement
  
Interesting note:
- Temporal effects of ratings
  * Some movies were rating higher immediately after viewing than later on 
    (Patch Adams)
  * Some had the opposite effect (Momento)
- A naive approach works relatively well:
  * a_ij = 0.5 * (mean(a_{i,:}) + mean(a_{:,j}))
  * I.e., average of average ratings for a movie and average rating a user gives

================================================================================
* Adapting Matrix Factorization

Consider a user-rating matrix (users on rows, rating per movie in column)

To represent as a graph:
- Create bipartite graph: B1 = {users}, B2 = {movies}, E = {ratings}
- E.g., users give movies ratings

We want to predict what rating a user will give to a movie they haven't seen.

We can use the same approach of matrix factorization:
- Consider A as 'bipartite adjacency matrix'
- We want to solve A ~= UV'
- A = (n x m) matrix, U = (n x k), V = (m x k)
- For each i of n users  => we have k latent features u_i
- For each j of m movies => we have k latent features v_j
- Prediction for a_ij = u_i*v_j'
- Our minimization problem: 
  * min_{U,V} sum_{nonzeros} (a_ij - u_i*v_j')^2
  * also considering regularization: β_1*|U|^2 + B_2*|V|^2

To descend to our gradient:
(d e_ij^2) / (d u_i) = 2 (A_ij - u_i*v_j') * (-v_j') = -2*e_ij*v_j'
(d e_ij^2) / (d v_j) = 2 (A_ij - u_i*v_j') * (-u_i)  = -2*e_ij*u_i

With regularization terms: 2*β_1*(u_i), 2*β_2*(v_j)

Full update equations:
u_i = u_i + α*2*(e_ij*v_j' - β*u_i)
v_j = v_j + α*2*(e_ij*u_i  - β*v_j)

================================================================================
* Considerations and challenges with Collaborative Filtering

Sparsity of the data:
- We have considerably few ratings than 'non-ratings'

Scale of the data:
- How to we adapt techniques as the data increases in scale?

Generalization:
- How does our approach generalize to new data?
- Have to be careful that we don't 'overfit' our training data

The 'cold-start' problem:
- How do we predict for a new user?

The 'black/grey sheep' problem:
- Some users are not similar to anyone (explicit or implicit)
- One approach: train without these datapoints