LFD-Lab


Software:

Disclaimer: All software is provided for free and as is, with no explicit or implicit warranty. In other words, you use completely at your own risk.


Fast Graph Embedding
  • Sampled Spectral Distance Embedding (SSDE)
    Contributors: Ali Civril, Eli Bocke-Rivel, Jonathan Purnell, Malik Magdon-Ismail
    Summary: A given (undirected and connected) graph is embedded into Cartesian space in such a way as to minimize the loss of information contained in pairwise distances. The algorithm executes quickly due to clever sampling of the nodes and a matrix approximation method that runs in linear time with respect to the number of nodes in the graph.
    Code: SSDE; Test Data; Documentation.
    Papers: SSDE (2007).

Unsupervised Learning
  • Low-Rank Perturbation Gaussian Mixture Model (LRPGMM)
    Contributors: Jonathan Purnell, Saurabh Paul, Malik Magdon-Ismail
    Summary: A modified EM algorithm for training a Gaussian Mixture Model (GMM). The defining difference is the approximation of the covariance matrix for each mixture. The GMM can be trained using the full covariance matrix, a diagonal matrix approximation (i.e. only the variances), or using a low-rank perturbed diagonal matrix decomposition (LRPDD). With the LRPDD, the covariance matrix is represented by a linear number of parameters (with respect to the dimension of the data being modeled). This decomposition allows for a linear time (vs. quadratic or cubic) training the GMM and has less overfitting that occurs with a full rank covariance matrix. Additionally, there is a naive overlapping clustering algorithm built-in to the code that clusters the data based on the posterior probabilities.
    Code: LRPGMM; Documentation.
    Note: This code relies on the ALGLIB library.
    Papers: LRPGMM (Ideal 2010); LRPGMM (IJDMMM 2011).

Graph Clustering; Overlapping Communities Ranking and Prominence in Networks
  • ABC-Centrality (attentive betweenness centrality)
    Contributors: Xiaohui Lu, Sibel Adali, Malik Magdon-Ismail
    Summary: Betweenness centrality measures how critical a node is to information flow in a network. A node is critical (and hence should have high betweeness) if it is on many shortest paths. Two shortcomings of such a measure are:
    • It ignores nodes on ``almost shortest'' paths;
    • It assumes that a node can provide the same attention to information flow through each of those shortest paths, no matter how many shortest paths the node controls.
    There have been attempts to address these concerns in the literature, with partial success. We provide a new measure, ABC centrality, that measures criticality by the amount of attention a node devotes to the information flow between other nodes. Our measure addresses both the aforementioned concerns and can be computed efficiently. It performs as well or better than betweenness centrality on both stylized networks and large scale real data networks, and hence provides a useful tool for measuring node criticality.
    Code: ABC-Centrality; Documentation.
    Papers: ABC-Centrality (SocialCom 2011).

  • iHypR (Iterative Hyperedge Ranking)
    Contributors: Xiaohui Lu, Sibel Adali, Malik Magdon-Ismail
    Summary: We present an iterative hyperedge ranking (iHypR) algorithm for computing prominence of actors in heterogeneous collaborative networks. The algorithm builds on the assumption that prominent actors collaborate on prominent objects, and prominent objects are naturally grouped into prominent clusters or groups (hyperedges in a graph). iHypR makes use of the relationships between actors, objects and hyperedges to compute a global prominence score for the actors in the network. The algorithm is customized for networks of collaborations, but it is generally applicable without further tuning. A core component in our algorithm is the clustering of objects into hyperedges. We use SSDE-cluster for this. (SSDE-cluster could be replaced with any overlapping clustering algorithm, but we have found the SSDE-cluster performs best.
    Code: iHypR; Documentation.
    Papers: iHypR (submitted).

Hidden Groups
  • Extracting Hidden Groups and their Structure from Streaming Interaction Data
    Contributors: Ben Caulfield, Mark K. Goldberg, Mykola Hayvanovych, Malik Magdon-Ismail, and William A. Wallace
    Summary: This program is an implementation of the algorithms described in "Extracting Hidden Groups and their Structure from Streaming Interaction Data" by Mark K. Goldberg, Mykola Hayvanovych, Malik Magdon-Ismail, and William A. Wallace. This program may be used to find chain and sibling communication triples from interaction data. There are two sub-programs that may be run with this program: The first is used to determine at what frequency triples must occur in order to be considered significant (called the "significance threshold"). This is done by generating lists of random communications based on the original data and finding the number of triples that occur in these lists. The second sub-program seperates the triples into cliques and may be used to find larger communication-structures in the data.
    Code: Hidden_Groups; Documentation.
    Papers: Hidden_Groups.pdf.

  • Find Matchings Between a List of Sent Messages and a List of Received Messages
    Contributors: Lingxun Hu, Ben Caulfield, and Malik Magdon-Ismail
    Summary: In a social network, each node is able to send or receive messages. Each node can send messages to any other nodes in the network except to itself. The only information we can acquire is whether a node sent or received a message and the time it sent or received that message. Our goal is to find as many as possible valid matchings between the sent messages and received messages, based on the information we can get. We use two different algorithms: dynamic programming methods and chain algorithm from the paper below. The output of this program can be used as the input of program "Extracting Hidden Groups and their Structure from Streaming Interaction Data".
    Code: Find_matchings.zip;
    Papers: Hidden_Groups.pdf.