* Faculty       * Staff       * Students & Alumni       * Committees       * Contact       * Institute Directory
* Undergraduate Program       * Graduate Program       * Courses       * Institute Catalog      
* Undergraduate       * Graduate       * Institute Admissions: Undergraduate | Graduate      
* Colloquia       * Seminars       * News       * Events       * Institute Events      
* Overview       * Lab Manual       * Institute Computing      
No Menu Selected

* News


Thinking Sparsely: Scalable Tensor Decompositions for Multi-aspect Data Mining

Jimeng Sun
IBM TJ Watson lab

Thursday, April 10th
Troy 2018 - 4:00 p.m. to 5:00 p.m.
Refreshments at 3:30 p.m.


Modern applications generate large amounts of data with multiple aspects and high dimensionalities, such as Internet traffic, telecommunication records, and large-scale social networks.Tensors (i.e., multi-way arrays) provide a natural representation for such data. Consequently, tensor decompositions such as Tucker become important tools to summarize and analyze such data. One major challenge is how to deal with high-dimensional, sparse data. In other words, how do we compute decompositions of tensors where most of the entries of the tensor are zero. We see such behavior, e.g., in Internet traffic data where valid (non-zero) sender-recipient pairs are rare compared to all possible combinations, or in a social network where a person usually inter-connects with only a few people in a network. Thanks to relative low inter-connectivity (sparsity), extremely high-dimensional tensors can be stored with moderate hardware. Specialized techniques are needed for computing the Tucker decompositions for sparse tensors because standard algorithms do not account for the sparsity of the data. As a result, a surprising phenomenon is observed by practitioners: Despite the fact that there are enough memory to store both the input tensors and the factorized output tensors, memory overflows occur during tensor factorization process. To address this intermediate blowup problem, we propose the Memory-Efficient Tucker (MET). Based on the available memory, MET adaptively selects the right execution strategy during the decomposition. We provide quantitative and qualitative evaluation of MET on real tensors. It achieves over 1000X space reduction without sacrificing speed; it also allows us to work with much larger tensors that were too big to deal with before. Finally, we demonstrate some data mining case-studies using MET.


Dr. Jimeng Sun is a researcher at IBM TJ Watson lab. He received a Bachelor and MPhil in Computer Science from Hong Kong University of Science and Technology in 2002 and 2003. After that, he obtained a MS and PhD degree in Computer Science from Carnegie Mellon University in 2006 and 2007. His research interests include data mining on streams and networks, databases and service science. He has received the best research paper award in SIAM Data mining conference (SDM) 2007. He has published over 20 refereed articles and one book chapter. He filed four patents and has given four tutorials on top data mining and databases conferences.

Hosted by: Petros Drineas (x-8265)

Administrative support: Chris Coonrad (x8412)

For more information:

See the speaker's home page

Last updated: March 21st, 2008