XML query optimization in Timber.
dc.contributor.author | Wu, Yuqing | |
dc.contributor.advisor | Jagadish, Hosagrahar V. | |
dc.date.accessioned | 2016-08-30T15:35:52Z | |
dc.date.available | 2016-08-30T15:35:52Z | |
dc.date.issued | 2004 | |
dc.identifier.uri | http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:3137966 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/124340 | |
dc.description.abstract | With the growing importance of XML as a format for data representation and data transport, being able to process XML queries efficiently becomes a topic of interest. Relational database techniques, even though mature, don't always apply in an XML context, due to the special features of XML data and XML queries. In this thesis, we study these features and investigate various aspects of query processing and query optimization in the framework of the Timber native XML database system. We evaluate XML queries by breaking them into pattern trees, which specify the nodes to match, and the desired relationships between the nodes. The structural relationships are evaluated via structural joins. We introduce two families of structural join algorithms (Tree-Marge and Stack-Based), which take advantage of the (DocId, StartPos : EndPos, LevelNum) representation of positions of XML components and perform structural join efficiently, with time, space and I/O complexity linear to the input and output lists. In this thesis, we focus on cost-based optimization for pattern tree matching. We propose five structural join order selection algorithms and show that fully-pipelined plans are possible for any given query pattern. Moreover, left-deep plans, which are chosen as the rule-of-thumb in a relational database, are not desirable. Relaxing the fully-indexed assumption brings issues related to data instantiation into the picture. The challenge is that in XML databases, the data associated with an element can vary. We define data instantiation as a first-class physical operator and take into consideration both the placement and granularity of data instantiation while optimizing pattern tree matching. Accurate cost-based optimization depends on accurate result size estimation. We propose a novel summary structure, <italic>position histogram</italic>, to capture the position relationship between nodes in an XML document. Experiments demonstrate that the <italic>Ph-Hist</italic> algorithm is capable of estimating the result size accurately, for both single twig patterns and patterns that are arbitrarily complex. Collectively, we are trying to address issues of storing and querying XML data efficiently, in the context of the Timber project. However, the theories behind the techniques proposed in this thesis apply irrespective to how the XML data is stored. | |
dc.format.extent | 177 p. | |
dc.language | English | |
dc.language.iso | EN | |
dc.subject | Query Optimization | |
dc.subject | Structural Join | |
dc.subject | Timber | |
dc.subject | Xml | |
dc.title | XML query optimization in Timber. | |
dc.type | Thesis | |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Applied Sciences | |
dc.description.thesisdegreediscipline | Computer science | |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/124340/2/3137966.pdf | |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.