* Faculty       * Staff       * Students & Alumni       * Committees       * Contact       * Institute Directory
* Undergraduate Program       * Graduate Program       * Courses       * Institute Catalog      
* Undergraduate       * Graduate       * Institute Admissions: Undergraduate | Graduate      
* Colloquia       * Seminars       * News       * Events       * Institute Events      
* Overview       * Lab Manual       * Institute Computing      
No Menu Selected

* News


Generic Entity Resolution

Hector Garcia-Molina
Departments of Computer Science and Electrical Engineering
Stanford University

Wednesday, September 20, 2006
Folsom Library Fischbach Room, 11:00 a.m.to 12:00 p.m.
(Refreshments at 10:30 a.m.)


Entity resolution (ER) is a problem that arises in many information integration scenarios: We have two or more sources containing records on the same set of real-world entities (e.g., customers). However, there are no unique identifiers that tell us what records from one source correspond to those in the other sources. Furthermore, the records representing the same entity may have differing information, e.g., one record may have the address misspelled, another record may be missing some fields. An ER algorithm attempts to identify the matching records from multiple sources (i.e., those corresponding to the same real-world entity), and merges the matching records as best it can.

In this talk I will describe a "generic" ER approach where the functions for comparing and merging records are black-boxes, invoked on pairs of records. I will describe a set of important properties of the black-boxes that enable efficient ER. I will also introduce three algorithms for ER: one for the general case, one for the case the properties hold, and one when the computations can be distributed across multiple processors. If time permits, I will show some experimental comparisons of the algorithms, based on comparison shopping data provided by Yahoo.


Hector Garcia-Molina is the Leonard Bosack and Sandra Lerner Professor in the Departments of Computer Science and Electrical Engineering at Stanford University, Stanford, California. He was the chairman of the Computer Science Department from January 2001 to December 2004. From 1997 to 2001 he was a member the President's Information Technology Advisory Committee (PITAC). From August 1994 to December 1997 he was the Director of the Computer Systems Laboratory at Stanford. From 1979 to 1991 he was on the faculty of the Computer Science Department at Princeton University, Princeton, New Jersey. His research interests include distributed computing systems, digital libraries and database systems. He received a BS in electrical engineering from the Instituto Tecnologico de Monterrey, Mexico, in 1974. From Stanford University, Stanford, California, he received in 1975 a MS in electrical engineering and a PhD in computer science in 1979. Garcia-Molina is a Fellow of the Association for Computing Machinery and of the American Academy of Arts and Sciences; is a member of the National Academy of Engineering; received the 1999 ACM SIGMOD Innovations Award; is on the Technical Advisory Board of DoCoMo Labs USA, Yahoo Search & Marketplace; is a Venture Advisor for Diamondhead Ventures, and is a member of the Board of Directors of Oracle and Kintera.

Hosted by: Petros Drineas (x8265)