Provenance in Modifiable Datasets.

Zhang, Jing

Provenance in Modifiable Datasets.

dc.contributor.author	Zhang, Jing	en_US
dc.date.accessioned	2012-10-12T15:25:29Z
dc.date.available	NO_RESTRICTION	en_US
dc.date.available	2012-10-12T15:25:29Z
dc.date.issued	2012	en_US
dc.date.submitted	2012	en_US
dc.identifier.uri	https://hdl.handle.net/2027.42/94013
dc.description.abstract	The provenance of derived data, which explains the derivation and retrieves or captures the source data, is valuable information for the data consumers possibly due to different purposes, e.g., audit requirements, error tracing, data reproduction and etc. The provenance of a derived datum should include all the details about how it is derived, including in particular, the source data used in its derivation. The provenance of a derived datum can be recorded during the original derivation process but storing it explicitly can incur very high storage cost. Therefore, techniques have been developed to record only a small amount of information, which can be used later to retrieve the full provenance from the source dataset. Such provenance retrieval relies on the provenance being present in the dataset in order to be retrieved by tracing queries. However, many datasets are subject to modifications, e.g, new experimental data is collected and stored. In this thesis, we investigate the retrieval of the provenance of a derived datum from a modifiable dataset, specifically we consider the following four questions: (i). Can we explain what a particular derived datum depends on, even if a value used in its derivation has since been modified. (ii). Can we determine if a particular derived datum is still valid upon the source dataset modifications without performing full view maintenance but through examining its provenance. (iii). Can we retrieve part of the provenance of a given datum due to the users' request or the fact that the rest of the provenance is missing. (iv). Can we retrieve the provenance of a derived datum without predefined granularity in an unstructured dataset. In this thesis, we provide affirmative answers to the above questions in the form of new techniques that use limited space and computational effort.	en_US
dc.language.iso	en_US	en_US
dc.subject	Database	en_US
dc.subject	Provenance	en_US
dc.title	Provenance in Modifiable Datasets.	en_US
dc.type	Thesis	en_US
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Computer Science & Engineering	en_US
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies	en_US
dc.contributor.committeemember	Jagadish, Hosagrahar V.	en_US
dc.contributor.committeemember	Hedstrom, Margaret L.	en_US
dc.contributor.committeemember	Lefevre, Kristen R.	en_US
dc.contributor.committeemember	Cafarella, Michael John	en_US
dc.subject.hlbsecondlevel	Computer Science	en_US
dc.subject.hlbtoplevel	Engineering	en_US
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/94013/1/jingzh_1.pdf
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: jingzh_1.pdf
Size:: 1.816MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.