30 Jan 2020
* Review centrality
Centrality ==> the 'importance' of vertices within a graph
- Degree centrality - degrees of a vertex, strength of connections within a
single hop of a vertex
- Closeness centrality - average shortest paths, indicates a vertex is within a
small number of hops to the rest of the graphs
- Betweenness centrality - ratio of shortest paths going through some vertex,
indicates the vertex is important for information
diffusion through the network
- Eigenvector centrality - a vertex is closely connected to other 'central'
vertices
================================================================================
* PageRank overview
Another spectral centrality measure
- Lot of applications which use some variant of it
* E.g., search, recommender systems
- You're (a vertex) important if you have a lot of important vertices linking to
you. Similar to baseline eigenvector centrality.
- The three models below are all equivalent, but described/expressed rather
differently. Good models for various graph computations.
================================================================================
* PageRank -- Random surfer model
Overview of the model:
- A person browsing/surfing the web is randomly clicking on hyperlinks.
Performing a random walk through the network.
- If the surfer randomly selects a zero-out-degree page, they jump to a randomly
selected vertex in the graph.
- Also: there is also a chance (alpha) they randomly jump from every page.
* Called the damping factor
* Useful for numerical stability
- Considering the number of times a page is landed on, versus the total number
'clicks' or 'visits' on all other pages, we have a probability distribution.
* P(v) = visits[v] / total_visits
* P(v) = probability at an arbitrary time that the surfer is on page v
================================================================================
* PageRank -- Vertex-centric calculation
Algorithm:
- All vertices have P(v) initialized to 1/|V|
- Update ==> P(v) = sum{ P(u) / d+(u) } for all u in N-(v)
* d+(u) = out degree of u
* N-(v) = predecessor set of v
- Iterate until ||P_{i+1} - P_i|| < e
- For zero-out-degree vertices ==> create edges to all other vertices
================================================================================
* PageRank -- Linear algebraic/power iteration calculation
Consider adjacency matrix A
Diagonal degree matrix D
- Row sums of A, placed along the diagonal of another NxN matrix
- Equal to the out degree
Transition probability matrix (out edges) M' = D^-1*A
- We actually will take the transpose M = M'^T = (D^-1*A)^T
Same algorithm:
- Initialize P(v) = 1/|V| for all v
- Iterate: P_{i+1} = M*P_i
- Until ||P_{i+1} - P_i|| < e
- (Power iteration)
Recall for an arbitrary matrix X and constant λ: λw = Xw
- w is an eigenvector
- λ is an eigenvalue
- ***PageRank vector == eigenvector with eigenvalue of 1 on our M matrix***
================================================================================
* Personalized PageRank
To extend random surfer model to personalize for vertex v or vertex set V:
- Consider a random walk starting from v (or random vertex in V)
* We randomly traverse out edges
* Jump back to v/V with probability alpha (like damping factor from before)
* Jump back to v/V if we hit a zero degree vertex
- The pagerank of some u is the probability that a walk ends on u
* 'ends' = we jump back to v
Use cases:
- Song recommendation (vertices = users and songs, edges = listened to)
- Search (seed V as certain topics, map a search to topics, then to pages based
on the personalized pageranks with the topic seeds V)
* Applications - link prediction for Super Bowl
* Applications - centrality analysis of citation networks