16 Jan 2020 * Review last class; graphs, connectivity Graph = (V, E) V = {a, b, c, ... } E = (e, f, g, h ... } e = (a, b), f = (a, c) ... - Can also include labels, weights, etc. Connectivity - A graph is connected if for all u,v in V(G), there exists a u,v-path - Connected components: maximal connected subgraphs Graph processing: - The concept of 'vertex centric processing' - Every vertex has some 'state' - We iterative update these states by examining the states of neighbors - Problem: large diameter graphs might affect # iterations for information flow * See 'pointer jumping' for ways to mitigate this issue (certain algs) ================================================================================ * Biconnectivity and k-connectivity A graph is biconnected if there exists at least two vertex-disjoint u,v-paths for all u,v in V(G) - Two paths are 'vertex-disjoint' if they don't overlap in terms of vertices - Except for start,end vertices u,v This implies => there exist no cut vertices in biconnected graph G - Removal of a 'cut vertex' disconnects an otherwise connected graph - Correspondingly -> removal of a 'cut edge' disconnects a graph Equivalent terminology: Cut vertex == articulation vertex Cut edge == bridge Why do we care in graph mining? - Failure points, weak points - For any network that we need to be connected * Road network, technological networks, etc. * Vertices become disconnected from the larger network Biconnectivity problem: - Identifying cut vertices and cut edges within a network - Biconnected components: maximal biconnected subgraphs Most real-world graphs are not biconnected (or connected) - Reason: trivial or near-trivial components - Trivial component: disconnected vertex (connectivity) degree-one vertex (biconnectivity) - The neighbor of any degree-one vertex is a cut vertex Generalize this concept: k-connectivity - k => the minimum number of vertices we must remove to disconnect a graph - 1-connected == connected (solve: DFS/BFS/propagation) - 2-connected == biconnected (solve: Hopcroft-Tarjan aka DFS) - 3-connected == triconnected (solve: Hopcroft-Tarjan aka DFS) - k-connected ==> solve for network flow - This relates to network 'robustness' * k = number of vertices we must remove to disconnect the vertex - Often, we'll talk about this concept in terms of a single vertex * How many vertices must we remove to disconnect vertex v? ================================================================================ * Directed graphs, strong and weak connectivity Directed graph D = (V,E) V = {a, b, c, ... } E = {e, f, g, ... } e = (a -> b), e = (b -> c) Consider edge list = (a -> b), (a -> c), (b -> a) d_in(a) = 1, the number of 'predecessors' d_out(a) = 2, the number of 'successors' Weak connectivity: a graph is weak connected if there exists a u,v-path for all u,v in V IF we ignore edge directionality Strong connectivity: a graph strongly connected if there exists a u,v-path for all u,v in V WHILE following edge directionality The concepts of k-connectivity can be extrapolated directed graphs - Again, think in terms of network 'robustness' ================================================================================ * The web graph Take a look at the 'Readings' for today on the course website Web graphs: - Vertices: web pages/domains/etc - Edges: hyperlinks between sites (often directed) For the homework: - Might be useful: Forward-backward algorithm - See: https://www.sandia.gov/~apinar/papers/irreg00.pdf - DCSC algorithm for SCC # extra SCC and IN, OUT related to root vertex v FWBW(D, v): OUT = BFS_out_edges(D, v) IN = BFS_in_edges(D, v) SCC = intersection(OUT, IN) * Discuss first homework * Calculating graph properties * Code: Working with real data, BFS and the FW-BW algorithm