24 Feb 2020 ================================================================================ * Homework Problem 1: - Simpler than you think - Nobody recovers or dies 'during' the simulation * Timeframe considerably shorter than it takes to recover/die * Data from actual conference interactions - 100 iterations of 'full simulation' * Full = all edges are simulation * Take averages over 100 full simulations For Problem 2: - No updates to the problem specification ================================================================================ * Project First presentation on Match 5th Expectations for the presentation: - 5-10 minutes presentation on: * Project proposal 'idea' - but should be relative thought out * Specific problem being addresses * Specific dataset for use or testing * Specific algorithms to implement/other code needing written * How testing is done - how will you know if you succeeded? * Anything else done so far Submit via submitty before class * PDFs much preferred for presentations (rolling dice with LibreOffice) * But code/text files/data, etc. can also be included + Still include some text giving an overview addressing the above points + So I can look back at it for later presentations/progress/etc. Only hard requirement: * Has to involve graphs (interaction data, algorithms on graphs, etc.) * Grad students: + Can involve ongoing research (but still need to make progress) + Groups can involve both grad+undergrad + |Group| = 1 is fine ================================================================================ * Link prediction via Matrix factorization We looked at solving an optimization problem: min_{U,V} sum_{nonzeros} (A_ij - u_i*V*u_j')^2 + w_0*sum_{zeros} (u_i*V*u_j')^2 + β_1*|U|^2 + B_2*|V|^2 We attempted to solve A ~= UVU' - Intuitively: U,V contain 'latent features' that describe interaction in our data - Via gradient descent - To summarize: * Can be implemented relatively easily (~10 lines) * Can be solved relatively quickly (~couple minutes) * Solution quality was so-so (better than random, not as good as 'social techniques') * Tough to generalize our solution, tough to add in new data ================================================================================ * Recommender Systems Definition: 'systems' that attempts to predict or 'recommend' user preferences - Link prediction is one problem that R.S. try to solve * How we differentiate the problem here: trying to not just predict a link, but the explicit strength or rating that a user might give to that link - Other examples: * Netflix recommending movies to you * Amazon recommending products * Advertising in general - how you'll feel about a product shown to you ================================================================================ * Collaborative Filtering One common approach for recommender systems. Definition: Making a selection from a wide set of possibilities (filtering) for a user given known preferences of that user or those similar to that user or what known preferences are in general (collaborative) Like in Homework 2: - Filtering what links (products) we're predicting for a user - Based off of the purchases of 'similar' users * We used Jaccard index+other stuff to define similarity General approaches for collaborative filtering: - The approach from HW2: defining explicit similarities between users - Current method: Matrix Factorization (latent features define similarity) - Machine learning approaches: Neural Networks, Random Forests, etc. * Might need to explicitly realize these features * Can captures the non-linear behavior that Matrix Factorization might miss ================================================================================ * Netflix challenge ~2007 or so: - Netflix: we have a predictor for user-movie preferences (CineMatach) - Predicts ratings (1-5) that a specific user would give a specific movie Challenge: - Improve accuracy of CineMatch by 10% - Released hundreds of millions of user-movie ratings - If you do, we'll give you $1,000,000 - Took a few years for a team to eventually get there * Used aggregation of ML techniques * MF by itself got a full 7% improvement Interesting note: - Temporal effects of ratings * Some movies were rating higher immediately after viewing than later on (Patch Adams) * Some had the opposite effect (Momento) - A naive approach works relatively well: * a_ij = 0.5 * (mean(a_{i,:}) + mean(a_{:,j})) * I.e., average of average ratings for a movie and average rating a user gives ================================================================================ * Adapting Matrix Factorization Consider a user-rating matrix (users on rows, rating per movie in column) To represent as a graph: - Create bipartite graph: B1 = {users}, B2 = {movies}, E = {ratings} - E.g., users give movies ratings We want to predict what rating a user will give to a movie they haven't seen. We can use the same approach of matrix factorization: - Consider A as 'bipartite adjacency matrix' - We want to solve A ~= UV' - A = (n x m) matrix, U = (n x k), V = (m x k) - For each i of n users => we have k latent features u_i - For each j of m movies => we have k latent features v_j - Prediction for a_ij = u_i*v_j' - Our minimization problem: * min_{U,V} sum_{nonzeros} (a_ij - u_i*v_j')^2 * also considering regularization: β_1*|U|^2 + B_2*|V|^2 To descend to our gradient: (d e_ij^2) / (d u_i) = 2 (A_ij - u_i*v_j') * (-v_j') = -2*e_ij*v_j' (d e_ij^2) / (d v_j) = 2 (A_ij - u_i*v_j') * (-u_i) = -2*e_ij*u_i With regularization terms: 2*β_1*(u_i), 2*β_2*(v_j) Full update equations: u_i = u_i + α*2*(e_ij*v_j' - β*u_i) v_j = v_j + α*2*(e_ij*u_i - β*v_j) ================================================================================ * Considerations and challenges with Collaborative Filtering Sparsity of the data: - We have considerably few ratings than 'non-ratings' Scale of the data: - How to we adapt techniques as the data increases in scale? Generalization: - How does our approach generalize to new data? - Have to be careful that we don't 'overfit' our training data The 'cold-start' problem: - How do we predict for a new user? The 'black/grey sheep' problem: - Some users are not similar to anyone (explicit or implicit) - One approach: train without these datapoints