Provenance in Scientific Databases
University of Edinburgh
Friday, April 13, 2007
Scientific results in many disciplines, especially biology, are now being collected into large databases which support "in silico"
experiments. Often, the databases are constructed by both manual editing (or "curation") and automatic data extraction, mining,
or cleaning steps. Because judgments of the quality of the data (and therefore of the reliability of the results) ultimately rest
on the choices made by the database curators, it is essential to record information called provenance which explains how the
database came to be the way it is. Since it is tedious and expensive for curators to record this information by hand, we study
the problem of building provenance tracking into the database system itself.
In this talk I will first present a simple, but effective approach to provenance tracking for manually curated databases. I will
then present recent work on the deeper issue of the expressiveness of provenance-tracking techniques for traditional relational
query and update languages. These results provide a formal basis for the comparison of alternative approaches and provide useful
semantic guarantees about the behavior of provenance-tracking techniques.