---

* News

Colloquia

Provenance in Scientific Databases

James Cheney
University of Edinburgh

Friday, April 13, 2007

Scientific results in many disciplines, especially biology, are now being collected into large databases which support "in silico" experiments. Often, the databases are constructed by both manual editing (or "curation") and automatic data extraction, mining, or cleaning steps. Because judgments of the quality of the data (and therefore of the reliability of the results) ultimately rest on the choices made by the database curators, it is essential to record information called provenance which explains how the database came to be the way it is. Since it is tedious and expensive for curators to record this information by hand, we study the problem of building provenance tracking into the database system itself.
In this talk I will first present a simple, but effective approach to provenance tracking for manually curated databases. I will then present recent work on the deeper issue of the expressiveness of provenance-tracking techniques for traditional relational query and update languages. These results provide a formal basis for the comparison of alternative approaches and provide useful semantic guarantees about the behavior of provenance-tracking techniques.


---

---