27 Jan 2020
* Centrality - basic definitions
Measure of a vertex's 'importance' within a network
- Note: related measures are 'influence' metrics
- Social networks: important people (Twitter: number of follows)
- Road/infrastructure networks: key intersections/endpoints of bridges
- Epidemiological networks: someone who is a small # hops from everyone else
- Recall: in the context of 'hubs' from the first class
================================================================================
* Degree centrality
Defined as the number of connections a vertex has
- # incident edges
- size of the neighborhood N(v)
- For directed graphs: d_in(v) or d_out(v) or (d_in(v) + d_out(v))
* Think of bots within Twitter, spam sites within the Web
Pros: Simple to calculate, # connections readily determine how information flows
in a single timestep when originating from some v
Cons: Easy metric to falsify (e.g., spam sites), doesn't capture much for
subsequent timesteps in a diffusive process
================================================================================
* Closeness centrality
A measure of how 'close' a vertex is to all other vertices in a network
- The average shortest path lengths
- Think in terms of how 'close' a vertex is to the 'center' of a graph
* I.e., how 'close' it is to all other vertices in the network
- How many hops/timesteps/etc. does it take for information originating at
vertex v to reach a majority of vertices in the network
Pros: Loosely determines how quickly information might be able to reach others
Cons: Difficult to calculate for all vertices in a graph, at least O(n^2)
================================================================================
* Betweenness centrality
The proportion of information flow in a network will go through vertex v
- The ratio of number of shortest x,y-paths through v over the number
of all shortest x,y-paths for all x != y != v in V(G)
- For information to flow from x to y, it will pass through v some proportion
of the time, relative to the above
Pros: Loosely determines key information flow 'cutpoints' within a network
Cons: Difficult to calculate, another O(n^2)
================================================================================
* Eigenvector centrality
Note: We'll talk more about this tomorrow (specifically, PageRank)
Basically, defines a vertices importance based on the importance of its
neighbors.
- Consider adjacency matrix A
* Eigenvalue centrality ==> solved for via Ax = λx
- You're important if your friends are important (and you have a lot of them)
Pros: Relatively easier to calculate than the above (using power iteration),
gives really good and intuitive results (PageRank via Google, Twitter
with 'who to follow')
Cons: Tough to infer or correlate from our (human's) perspective
================================================================================
* Diffusive Process
Generally, we consider diffusion and diffusive processes to statistically
measure how 'information' or 'data' or 'etc.' might flow through a network
Basic models:
- Vertex-centric behaviors
* Vertex v updates its state based on the state of its neighbors (and itself)
- Complexity of 'network response' depends on the complexity of each
vertex's individual behavior
* E.g., a small change for a local region might have a large global effect
Simple example:
- Initialize two competing 'ideas'
- All vertices update to the dominant idea in their local neighborhood
* variant of 'label propagation' (more later) -- lot of applications
================================================================================
* Epidemiology
Defined as how to study the spread of diseases. In our context, specifically
when considering some network topology.
- Can be considered a diffusive process
* Differentiation: **randomness**
- Note: while 'disease' has an explicit definition, this general concept can be
applied for a number of concepts (e.g., adoption of technology, memes, etc.)
SIR epidemic model:
- S = Susceptible -- a vertex isn't infected yes
- I = Infected -- a vertex is infected and can spread the disease
- R = Removed -- a vertex is no longer infected/can't spread (immune or dead)
How the model runs:
- Initialize some subset of vertices in I
- I state on some vertex lasts for t time steps
- p = probability of transmission on each interaction between u in I, v in S
- Iterate over timesteps until no vertices in I
Notes:
- p not likely fixed in reality
- Networks are dynamic
- etc.