* Faculty       * Staff       * Students & Alumni       * Committees       * Contact       * Institute Directory
* Undergraduate Program       * Graduate Program       * Courses       * Institute Catalog      
* Undergraduate       * Graduate       * Institute Admissions: Undergraduate | Graduate      
* Colloquia       * Seminars       * News       * Events       * Institute Events      
* Overview       * Lab Manual       * Institute Computing      
No Menu Selected

* News


Subset Selection for Linear Regression

David Kempe

December 1, 2011
JEC 3117 - 4:00 p.m. to 5:00 p.m.


One of the central problems in many data-driven sciences is the right selection of attributes to observe in order to accurately predict a variable of interest. Applications abound in areas as diverse as medical sciences (predicting medical conditions based on observable attributes), social sciences (predicting future behaviors or outcomes) and sensor networks, among others. In many of these cases, time or cost constraints prohibit sampling more than a few attributes. Also, many application domains use linear regression as a method of prediction, and evaluate the quality of attributes in terms of the R^2 fit with the quantity to be predicted. This motivates the following formal problem definition: "Given the covariances between observable variables X_i and a target variable Z, select k of the variables X_i such that the selected set has the best possible R^2 fit with Z." The main result presented in this talk is that so long as the covariance matrix between the X_i variables is far from singular, greedy algorithms frequently used in practice are provably constant-factor approximations. The proof is based on extending the widely used concept of submodularity to a notion of approximate submodularity, and relating it to the spectrum of the covariance matrix. Furthermore, we will investigate various graph-theoretical properties of covariance matrices which allow for efficient exact or approximate algorithms. We conclude with several exciting open questions. [This talk is based on joint work with Abhimanyu Das, appearing in ICML 2011 and STOC 2008.]

Hosted by: Dr. Elliot Anshelevich

Last updated: October 24, 2011