Show simple item record

Mining decentralized data repositories.

dc.contributor.authorJensen, Viviane Crestana
dc.contributor.advisorSoparkar, Nandit
dc.date.accessioned2016-08-30T16:23:24Z
dc.date.available2016-08-30T16:23:24Z
dc.date.issued2001
dc.identifier.urihttp://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:3024752
dc.identifier.urihttps://hdl.handle.net/2027.42/126978
dc.description.abstractTechnology for data mining, i.e., finding useful trends and patterns in large data repositories, has acquired significant importance with increasing availability of online data. While such technology is typically applied to centrally stored data, real-life database design and management, and performance aspects suggest the mining of decentralized data, which consists of several tables, perhaps obtained via normalization or partitioning and allocation, stored in several repositories with possibly separate administration and schema. The few prior extensions to mining for such data have algorithms developed largely for parallel processing as opposed to addressing the specific issues for decentralized data. Most approaches to mining decentralized data require the separate tables to be joined to form a single table. In contrast, this dissertation presents techniques for mining decentralized data that do not require the join of all tables. The approach exploits foreign key relationships to develop decentralized algorithms that execute concurrently on the separate tables, and thereafter merge the results. We develop our techniques using the specific example of association rules discovery. Important issues concerning the merging of partial results, the computation and memory requirements, and the associated costs and trade-offs are examined. Several different decentralized strategies arise, and an algebra is presented which allows enumeration of the many different decentralized mining strategies, each with different processing costs. Based on this algebra, heuristics are developed that reduce the overall computation, I/O, and communication costs. When cost estimates are available for the basic operations, there is an opportunity to optimize for the best strategy in a manner similar to query processing. As such, our approach may be suitably integrated with available query processing algorithms for large-scale decentralized data mining. Our decentralized approach is empirically validated, and in cases of interest it performs significantly better than the typical centralized approach. Several decentralized alternatives are implemented, and the heuristic rules are validated, i.e., are shown to choose optimal or nearly optimal plans. The decentralized approach presented in this dissertation may be adapted to different counting strategies, different storage structures, incremental mining, and to exploit indices and summary data where available; some of these improvements are infeasible in a centralized approach. This dissertation provides an approach to decentralized mining that establishes its feasibility and importance, and opens numerous new avenues for research in data mining.
dc.format.extent134 p.
dc.languageEnglish
dc.language.isoEN
dc.subjectData Mining
dc.subjectDecentralized Data Repositories
dc.subjectQuery Optimization
dc.titleMining decentralized data repositories.
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineApplied Sciences
dc.description.thesisdegreedisciplineComputer science
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/126978/2/3024752.pdf
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.