|
|
 |
News
Colloquia
Thinking Sparsely: Scalable Tensor Decompositions for Multi-aspect Data
Mining
Jimeng Sun
IBM TJ Watson lab
Thursday, April 10th
Troy 2018 - 4:00 p.m. to 5:00 p.m.
Refreshments at 3:30 p.m.
Abstract:
Modern applications generate large amounts of data with multiple
aspects and high dimensionalities, such as Internet traffic,
telecommunication records, and large-scale social networks.Tensors
(i.e., multi-way arrays) provide a natural representation for such
data. Consequently, tensor decompositions such as Tucker become
important tools to summarize and analyze such data.
One major challenge is how to deal with high-dimensional, sparse data.
In other words, how do we compute decompositions of tensors where most
of the entries of the tensor are zero. We see such behavior, e.g., in
Internet traffic data where valid (non-zero) sender-recipient pairs
are rare compared to all possible combinations, or in a social network
where a person usually inter-connects with only a few people in a
network. Thanks to relative low inter-connectivity (sparsity),
extremely high-dimensional tensors can be stored with moderate
hardware. Specialized techniques are needed for computing the Tucker
decompositions for sparse tensors because standard algorithms do not
account for the sparsity of the data. As a result, a surprising
phenomenon is observed by practitioners: Despite the fact that there
are enough memory to store both the input tensors and the factorized
output tensors, memory overflows occur during tensor factorization
process. To address this intermediate blowup problem, we propose the
Memory-Efficient Tucker (MET). Based on the available memory, MET
adaptively selects the right execution strategy during the
decomposition. We provide quantitative and qualitative evaluation of
MET on real tensors. It achieves over 1000X space reduction without
sacrificing speed; it also allows us to work with much larger tensors
that were too big to deal with before. Finally, we demonstrate some
data mining case-studies using MET.
Bio:
Dr. Jimeng Sun is a researcher at IBM TJ Watson lab. He received a
Bachelor and MPhil in Computer Science from Hong Kong University of
Science and Technology in 2002 and 2003. After that, he obtained a MS
and PhD degree in Computer Science from Carnegie Mellon University in
2006 and 2007. His research interests include data mining on streams and
networks, databases and service science. He has received the best
research paper award in SIAM Data mining conference (SDM) 2007. He has
published over 20 refereed articles and one book chapter. He filed four
patents and has given four tutorials on top data mining and databases
conferences.
Hosted by: Petros Drineas (x-8265)
Administrative support: Chris Coonrad (x8412)
For more information:
See the speaker's home page
Last updated: March 21st, 2008
|
 |
|
|