Show simple item record

Methods for efficient storage and indexing in XML databases.

dc.contributor.authorRunapongsa, Kanda
dc.contributor.advisorPatel, Jignesh M.
dc.date.accessioned2016-08-30T15:27:55Z
dc.date.available2016-08-30T15:27:55Z
dc.date.issued2003
dc.identifier.urihttp://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:3106155
dc.identifier.urihttps://hdl.handle.net/2027.42/123943
dc.description.abstractAs the eXtensible Markup Language (XML) continues to increase in popularity, it is clear that large repositories of XML data sets will emerge in the near future. Existing techniques for XML query processing are not very efficient and unlikely to scale well with large data sets. In this thesis, we address these shortcomings and investigate various aspects of query processing on large XML data sets. In the first part of the thesis, we propose an algorithm, called XORator, that maps documents based on their schema information into constructs in an Object-Relational Database Management System (ORDBMS). We compare the effectiveness of the XORator algorithm with an algorithm that maps XML data in a Relational Database Management System (RDBMS) and show that the XORator technique results in significant improvements in query response times. In the second part of this thesis, we propose a technique, called PAID, for storing XML data, independent of XML schemas. The PAID technique extends a previous numbering scheme by including path information and a pointer to the node's parent element. Our experimental results demonstrate that the PAID technique is more efficient than existing strategies by several orders of magnitude. The third part of this thesis presents an XML Index Selection Tool (XIST) that examines a combination of database query workload, data statistics, and XML schemas to suggest a set of indices that are beneficial to build. Our experiments show that XIST produces index recommendations that are more efficient than those produced by existing techniques. The final part of this thesis presents a micro-benchmark, called the Michigan benchmark, that can be used to evaluate the performance of XML database systems. The benchmark is an engineers' benchmark and is designed to pinpoint the strengths and weaknesses of individual components that constitute the entire DBMS. We have used the benchmark to test three databases and understand the factors that are critical to the performance in these DBMSs. Collectively, our research presents techniques for efficiently managing large XML data repositories. Although the bulk of the thesis has focused on techniques applicable to commercial ORDBMSs, many of these methods, such as the XIST tool, can also be adapted for native XML systems.
dc.format.extent216 p.
dc.languageEnglish
dc.language.isoEN
dc.subjectDatabases
dc.subjectEfficient
dc.subjectIndexing
dc.subjectMethods
dc.subjectStorage
dc.subjectXml
dc.titleMethods for efficient storage and indexing in XML databases.
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineApplied Sciences
dc.description.thesisdegreedisciplineComputer science
dc.description.thesisdegreedisciplineElectrical engineering
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/123943/2/3106155.pdf
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.