Methods for efficient storage and indexing in XML databases.

Runapongsa, Kanda

Methods for efficient storage and indexing in XML databases.

dc.contributor.author	Runapongsa, Kanda
dc.contributor.advisor	Patel, Jignesh M.
dc.date.accessioned	2016-08-30T15:27:55Z
dc.date.available	2016-08-30T15:27:55Z
dc.date.issued	2003
dc.identifier.uri	http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:3106155
dc.identifier.uri	https://hdl.handle.net/2027.42/123943
dc.description.abstract	As the eXtensible Markup Language (XML) continues to increase in popularity, it is clear that large repositories of XML data sets will emerge in the near future. Existing techniques for XML query processing are not very efficient and unlikely to scale well with large data sets. In this thesis, we address these shortcomings and investigate various aspects of query processing on large XML data sets. In the first part of the thesis, we propose an algorithm, called XORator, that maps documents based on their schema information into constructs in an Object-Relational Database Management System (ORDBMS). We compare the effectiveness of the XORator algorithm with an algorithm that maps XML data in a Relational Database Management System (RDBMS) and show that the XORator technique results in significant improvements in query response times. In the second part of this thesis, we propose a technique, called PAID, for storing XML data, independent of XML schemas. The PAID technique extends a previous numbering scheme by including path information and a pointer to the node's parent element. Our experimental results demonstrate that the PAID technique is more efficient than existing strategies by several orders of magnitude. The third part of this thesis presents an XML Index Selection Tool (XIST) that examines a combination of database query workload, data statistics, and XML schemas to suggest a set of indices that are beneficial to build. Our experiments show that XIST produces index recommendations that are more efficient than those produced by existing techniques. The final part of this thesis presents a micro-benchmark, called the Michigan benchmark, that can be used to evaluate the performance of XML database systems. The benchmark is an engineers' benchmark and is designed to pinpoint the strengths and weaknesses of individual components that constitute the entire DBMS. We have used the benchmark to test three databases and understand the factors that are critical to the performance in these DBMSs. Collectively, our research presents techniques for efficiently managing large XML data repositories. Although the bulk of the thesis has focused on techniques applicable to commercial ORDBMSs, many of these methods, such as the XIST tool, can also be adapted for native XML systems.
dc.format.extent	216 p.
dc.language	English
dc.language.iso	EN
dc.subject	Databases
dc.subject	Efficient
dc.subject	Indexing
dc.subject	Methods
dc.subject	Storage
dc.subject	Xml
dc.title	Methods for efficient storage and indexing in XML databases.
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Applied Sciences
dc.description.thesisdegreediscipline	Computer science
dc.description.thesisdegreediscipline	Electrical engineering
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/123943/2/3106155.pdf
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: 3106155.pdf
Size:: 9.307MB
Format:: PDF
Description:: Access Restricted to UM users only.

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.