Show simple item record

XML data warehousing.

dc.contributor.authorWiwatwattana, Nuwee
dc.contributor.advisorJagadish, Hosagrahar V.
dc.date.accessioned2016-08-30T16:21:15Z
dc.date.available2016-08-30T16:21:15Z
dc.date.issued2007
dc.identifier.urihttp://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:3276329
dc.identifier.urihttps://hdl.handle.net/2027.42/126855
dc.description.abstractData warehousing is an important application of database technology. Even though XML is ubiquitous and there are many XML databases, there are almost no XML data warehouses today. This thesis overcomes two of the many barriers towards accomplishing this goal---by representing and manipulating efficiently multiple hierarchies within an XML database used as a warehouse. XML format is flexible, and permits the graceful representation of heterogeneous data. However, it is limited in that it assumes there is a single perfect hierarchy in which the data can be organized. When the information to be represented naturally has multiple dimensions, as in data warehouses, fundamental tensions appear in the modeling and schema design. Data represented as deep trees is often un-normalized, leading to update anomalies, while normalized data tends to be shallow, resulting in heavy use of expensive value-based joins. As a solution, we propose an evolutionary and novel extension of the standard one-dimensional XML data model into a multi-dimensional model, called the Multi-Colored Trees (MCT) logical data model. MCT permits trees with multi-colored nodes to signify participation in multiple dimensions. We have developed algorithms to transform design specifications given as ER diagrams into MCT schemas. These MCT schemas satisfy various desirable properties, such as node normal form, edge normal form, and association recoverability. Experimental studies with warehousing data show that the schemas we designed in MCT have many benefits over conventional XML schemas, including query efficiency, query expression ease, and update anomaly avoidance. Even after modeling issues are resolved, we still have to consider issues of efficient implementation. We extend bitmap join indices to the XML context, and demonstrate experimentally their benefit for typical queries, including those with low cardinality or high selectivity. We also consider the data cube, a core warehouse analysis operator involving aggregations along multiple dimensions, and show that it cannot readily be expressed or evaluated for XML data. Specifically, XML data is not always summarizable because of missing and repeated sub-elements. We define an XML version of the OLAP cube operator, and appropriately extend relational cube computation algorithms.
dc.format.extent152 p.
dc.languageEnglish
dc.language.isoEN
dc.subjectData Warehousing
dc.subjectHeterogeneous Data
dc.subjectMulticolored Trees
dc.subjectXml
dc.titleXML data warehousing.
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineApplied Sciences
dc.description.thesisdegreedisciplineComputer science
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/126855/2/3276329.pdf
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.