Mining decentralized data repositories.

Jensen, Viviane Crestana

Mining decentralized data repositories.

dc.contributor.author	Jensen, Viviane Crestana
dc.contributor.advisor	Soparkar, Nandit
dc.date.accessioned	2016-08-30T16:23:24Z
dc.date.available	2016-08-30T16:23:24Z
dc.date.issued	2001
dc.identifier.uri	http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:3024752
dc.identifier.uri	https://hdl.handle.net/2027.42/126978
dc.description.abstract	Technology for data mining, i.e., finding useful trends and patterns in large data repositories, has acquired significant importance with increasing availability of online data. While such technology is typically applied to centrally stored data, real-life database design and management, and performance aspects suggest the mining of decentralized data, which consists of several tables, perhaps obtained via normalization or partitioning and allocation, stored in several repositories with possibly separate administration and schema. The few prior extensions to mining for such data have algorithms developed largely for parallel processing as opposed to addressing the specific issues for decentralized data. Most approaches to mining decentralized data require the separate tables to be joined to form a single table. In contrast, this dissertation presents techniques for mining decentralized data that do not require the join of all tables. The approach exploits foreign key relationships to develop decentralized algorithms that execute concurrently on the separate tables, and thereafter merge the results. We develop our techniques using the specific example of association rules discovery. Important issues concerning the merging of partial results, the computation and memory requirements, and the associated costs and trade-offs are examined. Several different decentralized strategies arise, and an algebra is presented which allows enumeration of the many different decentralized mining strategies, each with different processing costs. Based on this algebra, heuristics are developed that reduce the overall computation, I/O, and communication costs. When cost estimates are available for the basic operations, there is an opportunity to optimize for the best strategy in a manner similar to query processing. As such, our approach may be suitably integrated with available query processing algorithms for large-scale decentralized data mining. Our decentralized approach is empirically validated, and in cases of interest it performs significantly better than the typical centralized approach. Several decentralized alternatives are implemented, and the heuristic rules are validated, i.e., are shown to choose optimal or nearly optimal plans. The decentralized approach presented in this dissertation may be adapted to different counting strategies, different storage structures, incremental mining, and to exploit indices and summary data where available; some of these improvements are infeasible in a centralized approach. This dissertation provides an approach to decentralized mining that establishes its feasibility and importance, and opens numerous new avenues for research in data mining.
dc.format.extent	134 p.
dc.language	English
dc.language.iso	EN
dc.subject	Data Mining
dc.subject	Decentralized Data Repositories
dc.subject	Query Optimization
dc.title	Mining decentralized data repositories.
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Applied Sciences
dc.description.thesisdegreediscipline	Computer science
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/126978/2/3024752.pdf
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: 3024752.pdf
Size:: 4.053MB
Format:: PDF
Description:: Access Restricted to UM users only.

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.