* News


Geometric tools for high-dimensional data analysis

Dr Ann B. Lee
Department of Mathematics, Yale University

Tuesday, January 25, 2005
DCC 330- 4:00 p.m. to 5:00 p.m.
Refreshments at 3:30 p.m.

In many applied fields --- such as image analysis, information technology and biology --- one has to analyze noisy, but structured data, in very high dimensions (>1000 or even 10,000), often with a small number of samples. This “large d—small N” regime presents challenges for data analysis and calls for efficient dimension reduction tools that take the inherent geometry of natural data into account. In the first part of my talk, I will describe a multi-scale orthogonal basis that can be used for feature extraction of smooth data (such as images and spectral measurements) as well as non-smooth data (such as DNA micro arrays and word-document arrays). I will then, in the second half of the talk, describe a general methodology for organizing high-dimensional data sets by embedding the data into Euclidean space via a non-linear diffusion map. Examples will be taken from image analysis, word-document clustering and spectroscopy.

Last updated: January 11, 2005