30 Jan 2020

* Review centrality

Centrality ==> the 'importance' of vertices within a graph
- Degree centrality - degrees of a vertex, strength of connections within a
                      single hop of a vertex
- Closeness centrality - average shortest paths, indicates a vertex is within a
                         small number of hops to the rest of the graphs
- Betweenness centrality - ratio of shortest paths going through some vertex, 
                           indicates the vertex is important for information 
                           diffusion through the network
- Eigenvector centrality - a vertex is closely connected to other 'central' 
                           vertices
 
================================================================================
* PageRank overview

Another spectral centrality measure
- Lot of applications which use some variant of it
  * E.g., search, recommender systems
- You're (a vertex) important if you have a lot of important vertices linking to
  you. Similar to baseline eigenvector centrality.
- The three models below are all equivalent, but described/expressed rather
  differently. Good models for various graph computations.
  
================================================================================
* PageRank -- Random surfer model

Overview of the model:
- A person browsing/surfing the web is randomly clicking on hyperlinks. 
  Performing a random walk through the network.
- If the surfer randomly selects a zero-out-degree page, they jump to a randomly
  selected vertex in the graph.
- Also: there is also a chance (alpha) they randomly jump from every page.
  * Called the damping factor
  * Useful for numerical stability
- Considering the number of times a page is landed on, versus the total number 
  'clicks' or 'visits' on all other pages, we have a probability distribution.
  * P(v) = visits[v] / total_visits
  * P(v) = probability at an arbitrary time that the surfer is on page v

================================================================================
* PageRank -- Vertex-centric calculation

Algorithm:
- All vertices have P(v) initialized to 1/|V|
- Update ==> P(v) = sum{ P(u) / d+(u) } for all u in N-(v)
  * d+(u) = out degree of u
  * N-(v) = predecessor set of v
- Iterate until ||P_{i+1} - P_i|| < e
- For zero-out-degree vertices ==> create edges to all other vertices

================================================================================
* PageRank -- Linear algebraic/power iteration calculation

Consider adjacency matrix A
Diagonal degree matrix D 
- Row sums of A, placed along the diagonal of another NxN matrix
- Equal to the out degree
Transition probability matrix (out edges) M' = D^-1*A
- We actually will take the transpose M = M'^T = (D^-1*A)^T

Same algorithm:
- Initialize P(v) = 1/|V| for all v
- Iterate: P_{i+1} = M*P_i
- Until ||P_{i+1} - P_i|| < e
- (Power iteration)

Recall for an arbitrary matrix X and constant λ: λw = Xw
- w is an eigenvector
- λ is an eigenvalue
- ***PageRank vector == eigenvector with eigenvalue of 1 on our M matrix***

================================================================================
* Personalized PageRank

To extend random surfer model to personalize for vertex v or vertex set V:
- Consider a random walk starting from v (or random vertex in V)
  * We randomly traverse out edges
  * Jump back to v/V with probability alpha (like damping factor from before)
  * Jump back to v/V if we hit a zero degree vertex
- The pagerank of some u is the probability that a walk ends on u
  * 'ends' = we jump back to v

Use cases:
- Song recommendation (vertices = users and songs, edges = listened to)
- Search (seed V as certain topics, map a search to topics, then to pages based
  on the personalized pageranks with the topic seeds V)


* Applications - link prediction for Super Bowl
* Applications - centrality analysis of citation networks