03 Feb 2020 Today: Evolution of social networks Next week: (general) Link Prediction Note: no class next two Thursdays * Discuss Homework Going to be released "soon" Two primary problems 1.) Epidemiology simulation w/ centrality considerations 2.) Link prediction on Amazon purchase network ================================================================================ * Social networks Really: human interaction networks - Vertices ==> people - Edges ==> some form of interaction (communication, friendship, etc.) Properties: small-world, skewed, etc. ================================================================================ * Triadic closure If A and B are friends, A and C are friends, but B and C are not friends ==> higher likelihood that B and C will eventually become friends ==> 'Closing the triad', where [(A,B),(A,C)] is a triad Why is this observed? - Opportunity: A hangs out with B, A hangs out with C. Eventually, A may want to get together with both friends at the same time - Trust: inherent trust associated with friendships. A trusts C, so why shouldn't B trust C? - Incentive: There's a reason why A might want to close the triad. A doesn't want to keep having to do separate things with each of B and C. Over time: - It's observed that triads close at a higher rate relative to non-existant edges between random vertices in a graph. - I.e., a graph's clustering coefficient increases over time * Cluster coefficient = ratio of closed triads over all possible triads for some vertex v * Probability that any two of my friends are also friends ================================================================================ * Strong and weak ties Example: Research from the 60s on where people heard about their current job. => Was discovered that most people heard about it through acquaintances (weak tie) vs. close friends (strong tie) => Think about how weak ties in a social network connect disparate parts of the network. Novel information that diffuses through the network is likely to travel quickly between 'clusters' via these weaker ties. (betweenness, cut edges/vertices, etc.) Weak ties = cut edges? - In a social network, likely not. - But, we still have the notion of 'local bridges' == local 'cut' edges * Local 'cut' of e = (u,v), removing e increases u,v-path length by at least 2 How does this relate to the notion of triadic closure? - Strong triadic closure property: There's a higher likelihood that if A has strong ties to both B and C, and B and C aren't connected, then there's a higher likelihood B and C become friends than if one or both of A's ties to B and C were weak. - Local bridges ==> weak ties * Why? If A has strong ties to at least one non-bridge neighbor, then the strong triadic closure property makes the formation of inter-neighbor edges higher. Can we quantify this? - Consider a network that has defined strengths along the edges - First: correlate the neighborhood overlaps relative to tie strength - Second: Remove ties in order of weakest vs. strongest. Which will disconnect the graph quickest? (# of edge removals) - Observations: Neighborhood size and tie strength is positively correlated. Removal of weak ties disconnects a network significantly faster. ================================================================================ * Homophily Homophily: "birds of a fleather flock together", "like attracts like" - Or: similar people tend to be friends with each other - Selection: we inherently seek out people similar to ourselves - Influence: we inherently become more similar to people we spend time with Expand the notion of triadic closure: - Consider affiliation networks * Can be bipartite, but at least two distinct types of vertices (e.g., people and where the go to school) * We can generalize the notion of triadic closure, strong/weak ties, etc. ================================================================================ * Dynamic and Temporal networks Dynamic network: changes over time, addition of links and/or vertices Temporal network: Network that has time-data associated with it Experiment: - Over time, we expect stronger triads to close over time - Define strength based on the size of the neighborhood overlap between u,v - Look at various k = overlaps, at some time t_0 - What's the probability vs. k that an edge (u,v) was created by time t_1 Observation: - There appears to be a rather strong correlation between size of neighborhood overlap for u,v and whether or not edge (u,v) forms - Empirical validation of triadic closure Going forward ==> A lot of link prediction using network topology is based on the principles we discussed today. (triadic closure, homophily) ================================================================================ * Growth models Note: the strength of these observations (triadic closure, homophily, etc.) is dependent on the underlying growth process of the network Many networks grow while exhibiting 'preferential attachment' - "Rich get richer" - The probability a vertex gets more edges is a function of its degree - Barabasi-Albert model: * Start with m_0 vertices * A new vertex v is added * Attach v to some existing vertex u with probability p_u = k_u / degree_sum