30 Jan 2020 * Review centrality Centrality ==> the 'importance' of vertices within a graph - Degree centrality - degrees of a vertex, strength of connections within a single hop of a vertex - Closeness centrality - average shortest paths, indicates a vertex is within a small number of hops to the rest of the graphs - Betweenness centrality - ratio of shortest paths going through some vertex, indicates the vertex is important for information diffusion through the network - Eigenvector centrality - a vertex is closely connected to other 'central' vertices ================================================================================ * PageRank overview Another spectral centrality measure - Lot of applications which use some variant of it * E.g., search, recommender systems - You're (a vertex) important if you have a lot of important vertices linking to you. Similar to baseline eigenvector centrality. - The three models below are all equivalent, but described/expressed rather differently. Good models for various graph computations. ================================================================================ * PageRank -- Random surfer model Overview of the model: - A person browsing/surfing the web is randomly clicking on hyperlinks. Performing a random walk through the network. - If the surfer randomly selects a zero-out-degree page, they jump to a randomly selected vertex in the graph. - Also: there is also a chance (alpha) they randomly jump from every page. * Called the damping factor * Useful for numerical stability - Considering the number of times a page is landed on, versus the total number 'clicks' or 'visits' on all other pages, we have a probability distribution. * P(v) = visits[v] / total_visits * P(v) = probability at an arbitrary time that the surfer is on page v ================================================================================ * PageRank -- Vertex-centric calculation Algorithm: - All vertices have P(v) initialized to 1/|V| - Update ==> P(v) = sum{ P(u) / d+(u) } for all u in N-(v) * d+(u) = out degree of u * N-(v) = predecessor set of v - Iterate until ||P_{i+1} - P_i|| < e - For zero-out-degree vertices ==> create edges to all other vertices ================================================================================ * PageRank -- Linear algebraic/power iteration calculation Consider adjacency matrix A Diagonal degree matrix D - Row sums of A, placed along the diagonal of another NxN matrix - Equal to the out degree Transition probability matrix (out edges) M' = D^-1*A - We actually will take the transpose M = M'^T = (D^-1*A)^T Same algorithm: - Initialize P(v) = 1/|V| for all v - Iterate: P_{i+1} = M*P_i - Until ||P_{i+1} - P_i|| < e - (Power iteration) Recall for an arbitrary matrix X and constant λ: λw = Xw - w is an eigenvector - λ is an eigenvalue - ***PageRank vector == eigenvector with eigenvalue of 1 on our M matrix*** ================================================================================ * Personalized PageRank To extend random surfer model to personalize for vertex v or vertex set V: - Consider a random walk starting from v (or random vertex in V) * We randomly traverse out edges * Jump back to v/V with probability alpha (like damping factor from before) * Jump back to v/V if we hit a zero degree vertex - The pagerank of some u is the probability that a walk ends on u * 'ends' = we jump back to v Use cases: - Song recommendation (vertices = users and songs, edges = listened to) - Search (seed V as certain topics, map a search to topics, then to pages based on the personalized pageranks with the topic seeds V) * Applications - link prediction for Super Bowl * Applications - centrality analysis of citation networks