SpeakEasy Quick Facts

Why?

Are you tired of clustering results that are unstable? Do you want to be confident the clusters you detect will be replicated in future datasets? Do you have huge biological or social network data, and you want to use a method that will detect communities quickly without a bunch of parameters to tweak? We think SpeakEasy can be your solution. More...

What?

SpeakEasy is consensus clustering data, which means it clusters data multiple times and looks or consistent results. You can read about how it identifies clusters and the many applications in this paper.

Co-occurence Matrix.

More...

How?

SpeakEasy uses a label propagation approach that is very fast - it can cluster full matrices (completely connected networks) of 5-10000 nodes on a typical laptop. If you are clustering a sparse network with the typical 1% connectivity, we've tried out networks of 300,000 nodes, and again, they'll typically run in under a minute on a typical laptop.

If you have really large networks or you want to cluster results many times, then you can distribute the clusters across many CPU's, because each clustering process runs independently. We're working on a parallel version of SpeakEasy that will be posted soon.

Cell-type Clustering.

More...

Proof?

We're used SpeakEasy to cluster many types of biological datasets, which tend to be very noisy and difficult to cluster well. We think results are excellent based on comparing them to "gold standard" communities, and by checking the density of within-cluster vs between cluster connections. We've also tested it against the LFR benchmarks, and in most cases it shows the most accurate cluster recover of any method published to date. More...

Where?

You can request the code to run SpeakEasy at the Download page. We're working to update this with an R version. More...