Fast Data Analytics by Learning

Park, Yongjoo

Fast Data Analytics by Learning

dc.contributor.author	Park, Yongjoo
dc.date.accessioned	2017-10-05T20:28:03Z
dc.date.available	NO_RESTRICTION
dc.date.available	2017-10-05T20:28:03Z
dc.date.issued	2017
dc.date.submitted	2017
dc.identifier.uri	https://hdl.handle.net/2027.42/138598
dc.description.abstract	Today, we collect a large amount of data, and the volume of the data we collect is projected to grow faster than the growth of the computational power. This rapid growth of data inevitably increases query latencies, and horizontal scaling alone is not sufficient for real-time data analytics of big data. Approximate query processing (AQP) speeds up data analytics at the cost of small quality losses in query answers. AQP produces query answers based on synopses of the original data. The sizes of the synopses are smaller than the original data; thus, AQP requires less computational efforts for producing query answers, thus can produce answers more quickly. In AQP, there is a general tradeoff between query latencies and the quality of query answers; obtaining higher-quality answers requires longer query latencies. In this dissertation, we show we can speed up the approximate query processing without reducing the quality of the query answers by optimizing the synopses using two approaches. The two approaches we employ for optimizing the synopses are as follows: 1. Exploiting past computations: We exploit the answers to the past queries. This approach relies on the fact that, if two aggregation involve common or correlated values, the aggregated results must also be correlated. We formally capture this idea using a probabilistic distribution function, which is then used to refine the answers to new queries. 2. Building task-aware synopses: By optimizing synopses for a few common types of data analytics, we can produce higher quality answers (or more quickly for certain target quality) to those data analytics tasks. We use this approach for constructing synopses optimized for searching and visualizations. For exploiting past computations and building task-aware synopses, our work incorporates statistical inference and optimization techniques. The contributions in this dissertation resulted in up to 20x speedups for real-world data analytics workloads.
dc.language.iso	en_US
dc.subject	big data analytics systems
dc.subject	database systems
dc.subject	approximate query processing
dc.subject	database learning
dc.title	Fast Data Analytics by Learning
dc.type	Thesis	en_US
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Computer Science & Engineering
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Cafarella, Michael John
dc.contributor.committeemember	Mozafari, Barzan
dc.contributor.committeemember	Lagoze, Carl
dc.contributor.committeemember	Adar, Eytan
dc.contributor.committeemember	Jagadish, Hosagrahar V
dc.subject.hlbsecondlevel	Computer Science
dc.subject.hlbtoplevel	Engineering
dc.description.bitstreamurl	https://deepblue.lib.umich.edu/bitstream/2027.42/138598/1/pyongjoo_1.pdf
dc.identifier.orcid	0000-0003-3786-6214
dc.identifier.name-orcid	Park, Yongjoo; 0000-0003-3786-6214	en_US
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: pyongjoo_1.pdf
Size:: 1.709MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.