Show simple item record

Fast Data Analytics by Learning

dc.contributor.authorPark, Yongjoo
dc.date.accessioned2017-10-05T20:28:03Z
dc.date.availableNO_RESTRICTION
dc.date.available2017-10-05T20:28:03Z
dc.date.issued2017
dc.date.submitted2017
dc.identifier.urihttps://hdl.handle.net/2027.42/138598
dc.description.abstractToday, we collect a large amount of data, and the volume of the data we collect is projected to grow faster than the growth of the computational power. This rapid growth of data inevitably increases query latencies, and horizontal scaling alone is not sufficient for real-time data analytics of big data. Approximate query processing (AQP) speeds up data analytics at the cost of small quality losses in query answers. AQP produces query answers based on synopses of the original data. The sizes of the synopses are smaller than the original data; thus, AQP requires less computational efforts for producing query answers, thus can produce answers more quickly. In AQP, there is a general tradeoff between query latencies and the quality of query answers; obtaining higher-quality answers requires longer query latencies. In this dissertation, we show we can speed up the approximate query processing without reducing the quality of the query answers by optimizing the synopses using two approaches. The two approaches we employ for optimizing the synopses are as follows: 1. Exploiting past computations: We exploit the answers to the past queries. This approach relies on the fact that, if two aggregation involve common or correlated values, the aggregated results must also be correlated. We formally capture this idea using a probabilistic distribution function, which is then used to refine the answers to new queries. 2. Building task-aware synopses: By optimizing synopses for a few common types of data analytics, we can produce higher quality answers (or more quickly for certain target quality) to those data analytics tasks. We use this approach for constructing synopses optimized for searching and visualizations. For exploiting past computations and building task-aware synopses, our work incorporates statistical inference and optimization techniques. The contributions in this dissertation resulted in up to 20x speedups for real-world data analytics workloads.
dc.language.isoen_US
dc.subjectbig data analytics systems
dc.subjectdatabase systems
dc.subjectapproximate query processing
dc.subjectdatabase learning
dc.titleFast Data Analytics by Learning
dc.typeThesisen_US
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineComputer Science & Engineering
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberCafarella, Michael John
dc.contributor.committeememberMozafari, Barzan
dc.contributor.committeememberLagoze, Carl
dc.contributor.committeememberAdar, Eytan
dc.contributor.committeememberJagadish, Hosagrahar V
dc.subject.hlbsecondlevelComputer Science
dc.subject.hlbtoplevelEngineering
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/138598/1/pyongjoo_1.pdf
dc.identifier.orcid0000-0003-3786-6214
dc.identifier.name-orcidPark, Yongjoo; 0000-0003-3786-6214en_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.