03 Feb 2020
Today: Evolution of social networks
Next week: (general) Link Prediction
Note: no class next two Thursdays
* Discuss Homework
Going to be released "soon"
Two primary problems
1.) Epidemiology simulation w/ centrality considerations
2.) Link prediction on Amazon purchase network
================================================================================
* Social networks
Really: human interaction networks
- Vertices ==> people
- Edges ==> some form of interaction (communication, friendship, etc.)
Properties: small-world, skewed, etc.
================================================================================
* Triadic closure
If A and B are friends, A and C are friends, but B and C are not friends
==> higher likelihood that B and C will eventually become friends
==> 'Closing the triad', where [(A,B),(A,C)] is a triad
Why is this observed?
- Opportunity: A hangs out with B, A hangs out with C. Eventually, A may want to
get together with both friends at the same time
- Trust: inherent trust associated with friendships. A trusts C, so why
shouldn't B trust C?
- Incentive: There's a reason why A might want to close the triad. A doesn't
want to keep having to do separate things with each of B and C.
Over time:
- It's observed that triads close at a higher rate relative to non-existant
edges between random vertices in a graph.
- I.e., a graph's clustering coefficient increases over time
* Cluster coefficient = ratio of closed triads over all possible triads for
some vertex v
* Probability that any two of my friends are also friends
================================================================================
* Strong and weak ties
Example: Research from the 60s on where people heard about their current job.
=> Was discovered that most people heard about it through
acquaintances (weak tie) vs. close friends (strong tie)
=> Think about how weak ties in a social network connect disparate
parts of the network. Novel information that diffuses through the
network is likely to travel quickly between 'clusters' via these
weaker ties. (betweenness, cut edges/vertices, etc.)
Weak ties = cut edges?
- In a social network, likely not.
- But, we still have the notion of 'local bridges' == local 'cut' edges
* Local 'cut' of e = (u,v), removing e increases u,v-path length by at least 2
How does this relate to the notion of triadic closure?
- Strong triadic closure property: There's a higher likelihood that if A has
strong ties to both B and C, and B and C aren't connected, then there's a
higher likelihood B and C become friends than if one or both of A's ties to
B and C were weak.
- Local bridges ==> weak ties
* Why? If A has strong ties to at least one non-bridge neighbor, then the
strong triadic closure property makes the formation of inter-neighbor
edges higher.
Can we quantify this?
- Consider a network that has defined strengths along the edges
- First: correlate the neighborhood overlaps relative to tie strength
- Second: Remove ties in order of weakest vs. strongest.
Which will disconnect the graph quickest? (# of edge removals)
- Observations: Neighborhood size and tie strength is positively correlated.
Removal of weak ties disconnects a network significantly faster.
================================================================================
* Homophily
Homophily: "birds of a fleather flock together", "like attracts like"
- Or: similar people tend to be friends with each other
- Selection: we inherently seek out people similar to ourselves
- Influence: we inherently become more similar to people we spend time with
Expand the notion of triadic closure:
- Consider affiliation networks
* Can be bipartite, but at least two distinct types of vertices (e.g., people
and where the go to school)
* We can generalize the notion of triadic closure, strong/weak ties, etc.
================================================================================
* Dynamic and Temporal networks
Dynamic network: changes over time, addition of links and/or vertices
Temporal network: Network that has time-data associated with it
Experiment:
- Over time, we expect stronger triads to close over time
- Define strength based on the size of the neighborhood overlap between u,v
- Look at various k = overlaps, at some time t_0
- What's the probability vs. k that an edge (u,v) was created by time t_1
Observation:
- There appears to be a rather strong correlation between size of neighborhood
overlap for u,v and whether or not edge (u,v) forms
- Empirical validation of triadic closure
Going forward ==> A lot of link prediction using network topology is based
on the principles we discussed today.
(triadic closure, homophily)
================================================================================
* Growth models
Note: the strength of these observations (triadic closure, homophily, etc.) is
dependent on the underlying growth process of the network
Many networks grow while exhibiting 'preferential attachment'
- "Rich get richer"
- The probability a vertex gets more edges is a function of its degree
- Barabasi-Albert model:
* Start with m_0 vertices
* A new vertex v is added
* Attach v to some existing vertex u with probability p_u = k_u / degree_sum