27 Jan 2020 * Centrality - basic definitions Measure of a vertex's 'importance' within a network - Note: related measures are 'influence' metrics - Social networks: important people (Twitter: number of follows) - Road/infrastructure networks: key intersections/endpoints of bridges - Epidemiological networks: someone who is a small # hops from everyone else - Recall: in the context of 'hubs' from the first class ================================================================================ * Degree centrality Defined as the number of connections a vertex has - # incident edges - size of the neighborhood N(v) - For directed graphs: d_in(v) or d_out(v) or (d_in(v) + d_out(v)) * Think of bots within Twitter, spam sites within the Web Pros: Simple to calculate, # connections readily determine how information flows in a single timestep when originating from some v Cons: Easy metric to falsify (e.g., spam sites), doesn't capture much for subsequent timesteps in a diffusive process ================================================================================ * Closeness centrality A measure of how 'close' a vertex is to all other vertices in a network - The average shortest path lengths - Think in terms of how 'close' a vertex is to the 'center' of a graph * I.e., how 'close' it is to all other vertices in the network - How many hops/timesteps/etc. does it take for information originating at vertex v to reach a majority of vertices in the network Pros: Loosely determines how quickly information might be able to reach others Cons: Difficult to calculate for all vertices in a graph, at least O(n^2) ================================================================================ * Betweenness centrality The proportion of information flow in a network will go through vertex v - The ratio of number of shortest x,y-paths through v over the number of all shortest x,y-paths for all x != y != v in V(G) - For information to flow from x to y, it will pass through v some proportion of the time, relative to the above Pros: Loosely determines key information flow 'cutpoints' within a network Cons: Difficult to calculate, another O(n^2) ================================================================================ * Eigenvector centrality Note: We'll talk more about this tomorrow (specifically, PageRank) Basically, defines a vertices importance based on the importance of its neighbors. - Consider adjacency matrix A * Eigenvalue centrality ==> solved for via Ax = λx - You're important if your friends are important (and you have a lot of them) Pros: Relatively easier to calculate than the above (using power iteration), gives really good and intuitive results (PageRank via Google, Twitter with 'who to follow') Cons: Tough to infer or correlate from our (human's) perspective ================================================================================ * Diffusive Process Generally, we consider diffusion and diffusive processes to statistically measure how 'information' or 'data' or 'etc.' might flow through a network Basic models: - Vertex-centric behaviors * Vertex v updates its state based on the state of its neighbors (and itself) - Complexity of 'network response' depends on the complexity of each vertex's individual behavior * E.g., a small change for a local region might have a large global effect Simple example: - Initialize two competing 'ideas' - All vertices update to the dominant idea in their local neighborhood * variant of 'label propagation' (more later) -- lot of applications ================================================================================ * Epidemiology Defined as how to study the spread of diseases. In our context, specifically when considering some network topology. - Can be considered a diffusive process * Differentiation: **randomness** - Note: while 'disease' has an explicit definition, this general concept can be applied for a number of concepts (e.g., adoption of technology, memes, etc.) SIR epidemic model: - S = Susceptible -- a vertex isn't infected yes - I = Infected -- a vertex is infected and can spread the disease - R = Removed -- a vertex is no longer infected/can't spread (immune or dead) How the model runs: - Initialize some subset of vertices in I - I state on some vertex lasts for t time steps - p = probability of transmission on each interaction between u in I, v in S - Iterate over timesteps until no vertices in I Notes: - p not likely fixed in reality - Networks are dynamic - etc.