Ph.D. Theses

The Existence and Discovery of Overlapping Communities in Large-Scale Networks

By Stephen Kelley
Advisor: Mark Goldberg
December 2, 2009

The grouping of vertices in networks into a set of communities has long been an important component of network analysis. These communities, often called groups, modules, or coalitions, aid in understanding the underlying dynamics of networks. Current methods are varied in their approaches to generating groups, but most work on the assumption that the natural structure of the groups within the network are hierarchical. In this case, each group is composed of two or more disjoint subgroups forming a dendrogram of communities.

The creation of large data-sets representing networks of social relations and interactions has resulted in a gradual shift away from detecting disjoint communities in a hierarchy. It is intuitive that, within a social network, individuals be allowed to associate with any number of groups. In the real world, one might interact with communities centered around faith, employment, friends, and family. This shift has produced a number of methods and definitions of non-disjoint groups.

Despite the development of a number of community detection algorithms which allow groups to overlap, the use of methods identifying disjoint communities is still the norm. This is due, in large part, to the lack of a quantitative justification for overlapping groups. Previous work relies on intuitive justification for the application of overlapping detection methods rather than examining how often there is significant, natural overlap between communities in a given network.

In this thesis, current methods for the identification of overlapping communities are distilled into a set of axioms which lay out minimal criteria that a community detection algorithm's results should satisfy. In addition, a new algorithm, Connected Iterative Scan, is presented and tested across a wide range of synthetic and real networks and compared to previously proposed algorithms producing disjoint and overlapping communities. The algorithm is also applied on a large network of social interactions and the overlap between pairs of communities is analyzed. This analysis demonstrates that much of the overlap between pairs of groups is natural and many the associations contained by the set of groups could not be expressed in a disjoint grouping.

Return to main PhD Theses page