|
|
 |
News
Colloquia
Generic Entity Resolution
Hector Garcia-Molina
Departments of Computer Science and Electrical
Engineering
Stanford University
Wednesday, September 20, 2006
Folsom Library Fischbach Room, 11:00 a.m.to 12:00 p.m.
(Refreshments at 10:30 a.m.)
Abstract:
Entity resolution (ER) is a problem that arises in many information
integration scenarios: We have two or more sources containing records
on the same set of real-world entities (e.g., customers). However,
there are no unique identifiers that tell us what records from one
source correspond to those in the other sources. Furthermore, the
records representing the same entity may have differing information,
e.g., one record may have the address misspelled, another record may
be missing some fields. An ER algorithm attempts to identify the
matching records from multiple sources (i.e., those corresponding to
the same real-world entity), and merges the matching records as best
it can.
In this talk I will describe a "generic" ER approach where the
functions for comparing and merging records are black-boxes, invoked
on pairs of records. I will describe a set of important properties
of the black-boxes that enable efficient ER. I will also introduce
three algorithms for ER: one for the general case, one for the case
the properties hold, and one when the computations can be distributed
across multiple processors. If time permits, I will show some
experimental comparisons of the algorithms, based on comparison
shopping data provided by Yahoo.
Bio:
Hector Garcia-Molina is the Leonard Bosack and Sandra Lerner
Professor in the Departments of Computer Science and Electrical
Engineering at Stanford University, Stanford, California. He was the
chairman of the Computer Science Department from January 2001 to
December 2004. From 1997 to 2001 he was a member the President's
Information Technology Advisory Committee (PITAC). From August 1994
to December 1997 he was the Director of the Computer Systems
Laboratory at Stanford. From 1979 to 1991 he was on the faculty of
the Computer Science Department at Princeton University, Princeton,
New Jersey. His research interests include distributed computing
systems, digital libraries and database systems. He received a BS in
electrical engineering from the Instituto Tecnologico de Monterrey,
Mexico, in 1974. From Stanford University, Stanford, California, he
received in 1975 a MS in electrical engineering and a PhD in computer
science in 1979. Garcia-Molina is a Fellow of the Association for
Computing Machinery and of the American Academy of Arts and Sciences;
is a member of the National Academy of Engineering; received the 1999
ACM SIGMOD Innovations Award; is on the Technical Advisory Board of
DoCoMo Labs USA, Yahoo Search & Marketplace; is a Venture Advisor for
Diamondhead Ventures, and is a member of the Board of Directors of
Oracle and Kintera.
Hosted by: Petros Drineas (x8265)
|
 |
|
|