LFD-Lab
Software:
Fast Graph Embedding
Unsupervised Learning
Graph Clustering; Overlapping Communities;
Community Evolution Detection
Ranking and Prominence in Networks
Hidden Groups
Portfolio Allocation
Semantic Fragment Matching
Dual Set Sparsification of Matrices
Learning a Linear Model to Rank
Eye Tracking
Greedy Deep Learning (Node-By-Node)
Network Signature
Subgraph Classification using Network Signatures
Network Lens: Node Classification in Topologically Heterogeneous Networks
Intrinsic Scale of a Network
Supervised Mixture of Experts (SGMM, SBMM)
HMM-Boost
Pre Cluster and Merge (PCM)
Disclaimer: All software is provided for free and as is, with no explicit or implicit warranty. In other words, you use completely at your own risk.



Fast Graph Embedding
  • Sampled Spectral Distance Embedding (SSDE)
    Contributors: Ali Civril, Eli Bocke-Rivel, Jonathan Purnell, Malik Magdon-Ismail
    Summary: A given (undirected and connected) graph is embedded into Cartesian space in such a way as to minimize the loss of information contained in pairwise distances. The algorithm executes quickly due to clever sampling of the nodes and a matrix approximation method that runs in linear time with respect to the number of nodes in the graph.
    Code: SSDE; Test Data; Documentation.
    Papers: SSDE (2007).

Unsupervised Learning
  • Low-Rank Perturbation Gaussian Mixture Model (LRPGMM)
    Contributors: Jonathan Purnell, Saurabh Paul, Malik Magdon-Ismail
    Summary: A modified EM algorithm for training a Gaussian Mixture Model (GMM). The defining difference is the approximation of the covariance matrix for each mixture. The GMM can be trained using the full covariance matrix, a diagonal matrix approximation (i.e. only the variances), or using a low-rank perturbed diagonal matrix decomposition (LRPDD). With the LRPDD, the covariance matrix is represented by a linear number of parameters (with respect to the dimension of the data being modeled). This decomposition allows for a linear time (vs. quadratic or cubic) training the GMM and has less overfitting that occurs with a full rank covariance matrix. Additionally, there is a naive overlapping clustering algorithm built-in to the code that clusters the data based on the posterior probabilities.
    Code: LRPGMM; Documentation.
    Note: This code relies on the ALGLIB library.
    Papers: LRPGMM (Ideal 2010); LRPGMM (IJDMMM 2011).

Graph Clustering; Overlapping Communities; Community Evolution Detection
  • LOS Temporal (LOST)
    Contributors: James Thompson, Malik Magdon-Ismail
    Summary: A program to detect community evolutions in temporal social networks in a two step process. First, uninterrupted, tightly connected evolutions are detected. Second, similar evolutions are merged together.
    Code: Evolution Detection Code;
    Papers: Detecting Uninterrupted Evolutions. Merging Evolutions.

  • Anonymizing Vertices
    Contributors: James Thompson, Malik Magdon-Ismail
    Summary: Randomly anonymizes a given percentage of actors or vertices in a temporal community structure.
    Code: Vertex Anonymizer;

  • Synthetic Network Model
    Contributors: James Thompson, Malik Magdon-Ismail
    Summary: A program to generate a synthetic temporal social network with characteristics similar to those found in real world networks. The temporal network is represented as a series of static networks, frequently called snapshots. A community structure is embedded in each snapshot and community evolutions are embedded in the transitions between snapshots.
    Code: Synthetic Network Generator;

Ranking and Prominence in Networks
  • ABC-Centrality (attentive betweenness centrality)
    Contributors: Xiaohui Lu, Sibel Adali, Malik Magdon-Ismail
    Summary: Betweenness centrality measures how critical a node is to information flow in a network. A node is critical (and hence should have high betweeness) if it is on many shortest paths. Two shortcomings of such a measure are:
    • It ignores nodes on ``almost shortest'' paths;
    • It assumes that a node can provide the same attention to information flow through each of those shortest paths, no matter how many shortest paths the node controls.
    There have been attempts to address these concerns in the literature, with partial success. We provide a new measure, ABC centrality, that measures criticality by the amount of attention a node devotes to the information flow between other nodes. Our measure addresses both the aforementioned concerns and can be computed efficiently. It performs as well or better than betweenness centrality on both stylized networks and large scale real data networks, and hence provides a useful tool for measuring node criticality.
    Code: ABC-Centrality; Documentation.
    Papers: ABC-Centrality (SocialCom 2011).

  • iHypR (Iterative Hyperedge Ranking)
    Contributors: Xiaohui Lu, Sibel Adali, Malik Magdon-Ismail
    Summary: We present an iterative hyperedge ranking (iHypR) algorithm for computing prominence of actors in heterogeneous collaborative networks. The algorithm builds on the assumption that prominent actors collaborate on prominent objects, and prominent objects are naturally grouped into prominent clusters or groups (hyperedges in a graph). iHypR makes use of the relationships between actors, objects and hyperedges to compute a global prominence score for the actors in the network. The algorithm is customized for networks of collaborations, but it is generally applicable without further tuning. A core component in our algorithm is the clustering of objects into hyperedges. We use SSDE-cluster for this. (SSDE-cluster could be replaced with any overlapping clustering algorithm, but we have found the SSDE-cluster performs best.
    Code: iHypR; Documentation.
    Papers: iHypR (submitted).

Hidden Groups
  • Extracting Hidden Groups and their Structure from Streaming Interaction Data
    Contributors: Ben Caulfield, Mark K. Goldberg, Mykola Hayvanovych, Malik Magdon-Ismail, and William A. Wallace
    Summary: This program is an implementation of the algorithms described in "Extracting Hidden Groups and their Structure from Streaming Interaction Data" by Mark K. Goldberg, Mykola Hayvanovych, Malik Magdon-Ismail, and William A. Wallace. This program may be used to find chain and sibling communication triples from interaction data. There are two sub-programs that may be run with this program: The first is used to determine at what frequency triples must occur in order to be considered significant (called the "significance threshold"). This is done by generating lists of random communications based on the original data and finding the number of triples that occur in these lists. The second sub-program seperates the triples into cliques and may be used to find larger communication-structures in the data.
    Code: Hidden_Groups; Documentation.
    Papers: Hidden_Groups.pdf.

  • Find Matchings Between a List of Sent Messages and a List of Received Messages
    Contributors: Lingxun Hu, Ben Caulfield, and Malik Magdon-Ismail
    Summary: In a social network, each node is able to send or receive messages. Each node can send messages to any other nodes in the network except to itself. The only information we can acquire is whether a node sent or received a message and the time it sent or received that message. Our goal is to find as many as possible valid matchings between the sent messages and received messages, based on the information we can get. We use two different algorithms: dynamic programming methods and chain algorithm from the paper below. The output of this program can be used as the input of program "Extracting Hidden Groups and their Structure from Streaming Interaction Data".
    Code: Find_matchings.zip;
    Papers: Hidden_Groups.pdf.

Portfolio Allocation
  • Optimization Algorithms for Portfolio Allocation
    Contributors: Andrew Bolin, Malik Magdon-Ismail
    Summary: Modern portfolio theory (MPT) attempts to minimize risk for a given level of expected return or maximize expected return for a given level of risk. The code provided uses linear/quadratic programming to solve MPT optimizations and find efficient portfolios. Different optimization algorithms are included to handle different models of risk. This project is meant to simulate trading using MPT optimization algorithms. The code allows users to build and rebalance portfolios generated with real data using the MPT algorithms.
    Code: portsim.zip;
    Papers: PorftolioOptimization.pdf
    Data: prices.txt; SP100.txt; SP500.txt; stock_legend.txt.

Semantic Fragment Matching
  • Fragment Matching Through i-degree and i-partitioning
    Contributors: John Schwartz, Mark Goldberg, Malik Magdon-Ismail
    Summary: This project attempts to identify graphical queries within a given semantically labeled database structure and return a set of potential matches. Primarily, this is done by indexing the database and query using i-degree technology for faster graph feature comparison. The included code will output a restricted set of potential matches for each query node, in a .MX2 file. The included data is derived from DBpedia, and includes 10 example queries (fragments) with the produced output.
    Code: FragmentMatchCode.zip.
    Data: FragmentMatchData.zip.
    Supplimental: Read Me;

Dual Set Sparsification of Matrices Learning a Linear Model to Rank
  • Linear Programming Heuristic
    Contributors: Malik Magdon-Ismail
    Summary: The matlab code LinearRankerLPsoft.m takes as input a data matrix with ordered data and outputs a linear model that attempts to preserve that order using an infinity norm linear programming heuristic with "soft margin".
    Matlab Code: LinearRankerLPsoft.m (uses MATLAB linprog in the optimization toolbox).
    Matlab Code: TestLPranker.m (a wrapper that tests the function on a random problem).
    Supplimental: A Short Note on the Algorithm

Eye Tracking
  • Find Patterns of How People Play Tetris From Eye Tracking Data
    Contributors: Lingxun Hu, Malik Magdon-Ismail, Wayne Gray
    Summary: This package has two .jar files: FindSingleMoves.jar and GetLongerChains.jar. The output of FindSingleMoves.jar is the input of GetLongerChains.jar.
    Code: FindEyeTrackPatternsPackage

Greedy Deep Learning (Node-By-Node)
  • GN (ranks data according to distance to first feature) and GCN (partitions data into supervised classes)
    Contributors: Ke Wu, Malik Magdon-Ismail
    Summary: An efficient algorithm for pre-training a deep network that trains each node sequentially on partitions of the data. The features produced by the algorithm are also more interpretable.
    Python Code: Code.zip. ReadMe.txt.
    Papers: (MS Thesis) (arXiv)

Network Signature
  • Get a structured 2D image embedding (signature) of a subgraph of a given network.
    Contributors: Kshiteesh Hegde, Malik Magdon-Ismail
    Summary: Given a network and a subgraph size, it saves the structured image of a subgraph. The subgraph is obtained by performing a random walk on the network of length specified by the subgraph size given. A BFS based ordering scheme given in this ASONAM paper is applied to arrive at the signature.
    Python Code: .zip archive.
    Papers: (PhD Thesis, Chapter 3)

Subgraph Classification using Network Signatures
  • Identify the parent network of a small subgraph.
    Contributors: Kshiteesh Hegde, Malik Magdon-Ismail
    Summary: This package uses the novel structured image embedding feature to perform subgraph classification. Given a subgraph, you can predict its parent network. Please see the cited papers for more details.
    Python Code: .zip archive.
    Papers: (PhD Thesis, Chapter 4) (arXiv paper)

Network Lens: Node Classification in Topologically Heterogeneous Networks
  • This tool, called Network Lens, can be used to classify nodes in a toplogically heterogeneous network.
    Contributors: Kshiteesh Hegde, Malik Magdon-Ismail
    Summary: This package uses the novel structured image embedding feature to perform node classification in topologically heterogeneous networks. Given a network, you can classify its nodes into networks the model has been trained on. One of the applications of this tool is clustering. Please see the cited papers for more details.
    Python Code: .zip archive.
    Papers: (PhD Thesis, Chapter 5)

Intrinsic Scale of a Network
  • Given a network, get its intrinsic scale - the scale at which is reveals itself.
    Contributors: Malik Magdon-Ismail, Kshiteesh Hegde
    Summary: This package uses the novel structured image embedding feature to extract the intrinsic scale of the given network. Networks reveal their identities at a small scale (8-20 nodes). This tool extracts the scale for a given network. Please see the cited papers for more details.
    Python Code: .zip archive.
    Papers: (PhD Thesis, Chapter 6)

Supervised Mixtures of Experts (SGMM, SBMM)
  • Given a dataset with continuous or binary numerical values train and learn either an SGMM model or an SBMM model for classification.
    Contributors:Georgios Mavroudeas, Xiao Shou, Malik Magdon-Ismail
    Summary: This package gives the ability to train two types of MoE models, one with a mixture of gaussians gate function and one with a mixture of bernullis. The emmision function is pre-setted as a logistic regression, but there is flexibility to adjust it in any classification model. There are multiple options for per cluster L1 and L2 regularization as well as options to train with a mixture of labeled and unlabeled data.
    Python Code: .zip archive.
    Papers: (SGMM Paper)

HMM-Boost
  • Train and learn a semi supervised HMM model (HMM boost model) or a mixture of semi supervised HMM models.
    Contributors:Georgios Mavroudeas, Malik Magdon-Ismail
    Summary: This package applies the HMM-BOOST methodology to train and learn HMM and MHMM models using a semi supervised algorithm. It has the option to train a traditional unsupervised HMM. The observational model of the HMM is a mixture of gaussians per state.
    Python Code: .zip archive.
    Papers: (HMM-BOOST paper)

Pre Cluster and Merge (PCM)
  • Given a standard causal dataset, apply the PCM methodology and automatically uncover effect subpopulations.
    Contributors: Georgios Mavroudeas, Malik Magdon-Ismail
    Summary: This package uses the novel PCM algorithm to automatically extract effect clusters from causal populations.
    Python Code: .zip archive.
    Papers: (PCM work, submitted in ICLR 2023)