16 Jan 2020
* Review last class; graphs, connectivity
Graph = (V, E)
V = {a, b, c, ... }
E = (e, f, g, h ... }
e = (a, b), f = (a, c) ...
- Can also include labels, weights, etc.
Connectivity
- A graph is connected if for all u,v in V(G), there exists a u,v-path
- Connected components: maximal connected subgraphs
Graph processing:
- The concept of 'vertex centric processing'
- Every vertex has some 'state'
- We iterative update these states by examining the states of neighbors
- Problem: large diameter graphs might affect # iterations for information flow
* See 'pointer jumping' for ways to mitigate this issue (certain algs)
================================================================================
* Biconnectivity and k-connectivity
A graph is biconnected if there exists at least two vertex-disjoint u,v-paths
for all u,v in V(G)
- Two paths are 'vertex-disjoint' if they don't overlap in terms of vertices
- Except for start,end vertices u,v
This implies => there exist no cut vertices in biconnected graph G
- Removal of a 'cut vertex' disconnects an otherwise connected graph
- Correspondingly -> removal of a 'cut edge' disconnects a graph
Equivalent terminology:
Cut vertex == articulation vertex
Cut edge == bridge
Why do we care in graph mining?
- Failure points, weak points
- For any network that we need to be connected
* Road network, technological networks, etc.
* Vertices become disconnected from the larger network
Biconnectivity problem:
- Identifying cut vertices and cut edges within a network
- Biconnected components: maximal biconnected subgraphs
Most real-world graphs are not biconnected (or connected)
- Reason: trivial or near-trivial components
- Trivial component: disconnected vertex (connectivity)
degree-one vertex (biconnectivity)
- The neighbor of any degree-one vertex is a cut vertex
Generalize this concept: k-connectivity
- k => the minimum number of vertices we must remove to disconnect a graph
- 1-connected == connected (solve: DFS/BFS/propagation)
- 2-connected == biconnected (solve: Hopcroft-Tarjan aka DFS)
- 3-connected == triconnected (solve: Hopcroft-Tarjan aka DFS)
- k-connected ==> solve for network flow
- This relates to network 'robustness'
* k = number of vertices we must remove to disconnect the vertex
- Often, we'll talk about this concept in terms of a single vertex
* How many vertices must we remove to disconnect vertex v?
================================================================================
* Directed graphs, strong and weak connectivity
Directed graph D = (V,E)
V = {a, b, c, ... }
E = {e, f, g, ... }
e = (a -> b), e = (b -> c)
Consider edge list = (a -> b), (a -> c), (b -> a)
d_in(a) = 1, the number of 'predecessors'
d_out(a) = 2, the number of 'successors'
Weak connectivity: a graph is weak connected if there exists a u,v-path for all
u,v in V IF we ignore edge directionality
Strong connectivity: a graph strongly connected if there exists a u,v-path for
all u,v in V WHILE following edge directionality
The concepts of k-connectivity can be extrapolated directed graphs
- Again, think in terms of network 'robustness'
================================================================================
* The web graph
Take a look at the 'Readings' for today on the course website
Web graphs:
- Vertices: web pages/domains/etc
- Edges: hyperlinks between sites (often directed)
For the homework:
- Might be useful: Forward-backward algorithm
- See: https://www.sandia.gov/~apinar/papers/irreg00.pdf
- DCSC algorithm for SCC
# extra SCC and IN, OUT related to root vertex v
FWBW(D, v):
OUT = BFS_out_edges(D, v)
IN = BFS_in_edges(D, v)
SCC = intersection(OUT, IN)
* Discuss first homework
* Calculating graph properties
* Code: Working with real data, BFS and the FW-BW algorithm