Abstract

It is important to understand how a particular result was derived for various reasons such as credibility assessment. Provenance - the information which describes the entities, people and activities involved in producing a piece of data - can address this issue. Provenance of data can be created by logging application actions and stored in a persistent storage. Later this storage can be queried and the result can be feedback to the user or the system. This is called provenance lifecycle. In this lifecycle, domain specific interpretation of provenance of data has not been considered. In order to solve this issue, we put forth a two level solution and revise provenance lifecycle to propose annotation lifecycle. With this framework, it would be possible to have domain specific interpretation of provenance of data and feedback the result to the system.

Introduction

In many situations, it is important to know how an object came to be, who is responsible for the creation of that object and what processes are involved in generating the object. This information may be needed in various applications such as credibility assessment, trust judgement, decision making, and etc. Provenance is a concept which can help deliver this information for an object. It is a record which describes people, entities, and activities involved in producing an object. Applications can apply provenance lifecycle to support a notion of provenance in their systems. Provenance lifecycle consists of four phases. At first the application creates the provenance data by logging its actions (Create Provenance Description phase). This provenance data may be stored in a persistent storage (Record phase). Then it is possible to query the provenance storage for various reasons such as auditing purposes (Query phase). The result of query can be feed back to the user or the system itself for further processing (Feedback phase). Following figure demonstrates provenance lifecycle.

Creation, record, and feedback phases happen in application runtime but the query phase happens after application execution. The provenance of data can be stored in a persistent storage for a long time. Another application may need to query the provenance storage with different requirements than the original ones and then feedback the result to the system. The requirements have been changed and now they are domain specific.

We need a principle way of managing this domain specific interpretation of provenance of data. We put forth a two-level approach to address the issue of interpretation, annotation level and inference level. In the first level annotation is utilised as a generic mechanism to enable users to attach any information to elements of a provenance graph. In the second level, inference level, new annotations are inferred from other annotations.

We revised the provenance lifecycle to include the proposed two levels and call it Annotation Lifecycle. Annotation creation happens at annotation creation time after application runtime. After application execution, annotation inference level can be executed to infer new annotations. The inference may contain propagation and computation of old annotations to infer new annotations. These annotations may be queried and feedback to the system. Following figure demonstrates annotation lifecycle.