Organiser: Dr David Millard
Time: 12/12/2005 12:50-13:40
The importance of understanding the process by which a result was generated in an experiment is fundamental to science. Without such information, other scientists cannot reproduce, analyse or validate experiments. Provenance is therefore important to enable a scientist to trace how a particular result has been arrived at.
Based on the common sense definition of provenance, we propose a new definition of provenance that is suited to the computational model underpinning service oriented architectures: the provenance of a piece of data is the process that led to the data. Since our aim is to conceive a computer-based representation of provenance that allows us to perform useful reasoning about the origin of results, we examine the nature of such representation, which is articulated around the documentation of execution.
We then examine the architecture of a provenance system, centered around the notion of a provenance store designed to support the provenance lifecycle: during a recording phase some documentation of execution is archived in the provenance store, whereas a reasoning phase operates over the archived documentation. Then, we successively discuss a protocol for recording execution documentation, a query facility to gain access to the contents of the store, and a reasoning system to make inferences. The realisation of such an architecture is particularly challenging in the presence of e-Science experiments since it must be scalable.
The presentation will draw upon our experience in the PASOA (www.pasoa.org) and EU Provenance (www.gridprovenance.org) projects and will rely on explicit use cases derived from e-Science applications in the domain of bioinformatics, high energy physics, organ transplant management and aerospace engineering.
Any member of ECS with their name preceded by a sign has not given their permission for their information to appear on the public website.