Jonathan Purnell
Lally Building, 03A Department of Computer Science
Rensselaer Polytechnic Institute
Email: purnej[at]cs.rpi.edu
Phone: (518) 276-8556

Home
Publications
Code
CV
Open Source Code:

December 21, 2010: This page contains the various open source libraries that I have developed for my research. Although this code has been primarily used for the particular experiments, they have been designed to be used for other tasks with little to no change needed in the library files. I am currently working to improve these libraries to be as robust and flexible as possible. Feel free to contact me if you have questions, comments, or problems using the code.

Sampled Spectral Distance Embedding (SSDE):

Originally developed by Ali Civril and Eli Bocke-Rivel, this code implements the SSDE method. A given (undirected and connected) graph is embedded into Cartesian space in such a way as to minimize the loss of information in the graph edges. The algorithm executes quickly due to clever sampling of the nodes and a matrix approximation method that runs in linear time with respect to the number of nodes in the graph.

SSDE Code
SSDE Documentation

Related Works:

Civril, A., Magdon-Ismail, M.
SSDE: Fast graph drawing using sampled spectral distance embedding, Graph Drawing, 2007.

Low-Rank Perturbation Gaussian Mixture Model (LRPGMM):

This code is a modified version of the typical EM algorithm used for training a Gaussian Mixture Model (GMM). The defining difference is the approximation of the covariance matrix for each mixture. The GMM can be trained using the full covariance matrix, a diagonal matrix approximation (i.e. only the variances), or using a low-rank perturbed diagonal matrix decomposition (LRPDD). With the LRPDD, the covariance matrix is represented by a linear number of parameters (with respect to the dimension of the data being modeled). This decomposition allows for a linear time (vs. quadratic or cubic) training the GMM and minimizes the overfitting that occurs with a full rank covariance matrix. Additionally, there is a naive overlapping clustering algorithm built-in to the code that clusters the data based on the posterior probabilities.
Note: This code relies on the ALGLIB library.
GMM Code
GMM Documentation

Related Works:

Purnell, J., Magdon-Ismail, M.,
"Approximating the Covariance Matrix with Low-Rank Perturbations",  IDEAL, 2010.

SSDE-Clustering:

Given a graph, the SSDE code can be applied to embed the graph into Carteisan coordinates. With these coordinates, a GMM can be fitted to the data. In addition to the GMM model, the posterior probabilites for each node is calculated. Finally the data is cluster in one of two ways. One method is to use the posterior probabilities to cluster the nodes in such a way as to create higher quality clusters than would be found using the GMM's naive clustering function. The quality of the clusters is based on a metric described in the related works. The other method is specify the average number of clusters a data sample is assigned to. The SSDE-Cluster code provides the code for Cluster, which clusters the data, and the SSDE-Cluster shell script which provides a wrapper for SSDE, GMM, and Cluster.

SSDE-Cluster Code
SSDE-Cluster Documentation

Related Works:

Purnell, J., Magdon-Ismail, M. "Fast Overlapping Clustering of Networks Using Sampled Spectral Distance Embedding and GMMs" CS Dept., RPI, 2010