It's Data All the Way Down: Exploring the Relationship Between Machine Learning and Data Management
dc.contributor.author | Anderson, Michael | |
dc.date.accessioned | 2020-01-27T16:26:00Z | |
dc.date.available | NO_RESTRICTION | |
dc.date.available | 2020-01-27T16:26:00Z | |
dc.date.issued | 2019 | |
dc.date.submitted | 2019 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/153458 | |
dc.description.abstract | Data is central to machine learning: models are trained with data, trained models infer their predictions over input data, and the resulting inferences are themselves data. This being the case, there should be a natural relationship between machine learning and data management techniques. Much of machine learning research, perhaps understandably, focusses strictly on algorithmic improvements, chasing ever-increasing state-of-the-art accuracy measurements on their task of choice. Likewise, data management research has been slow to incorporate recent machine learning breakthroughs, like deep learning, to classic data management tasks. In this dissertation, we will demonstrate this relationship between machine learning and data management with a series of projects that improve aspects of machine learning through data management or improve data management with the addition of machine learning. Specifically, we detail two systems that use database-style methods to improve runtime issues traditionally associated with machine learning and a third project that uses recent machine learning methods to solve data quality issues. Our system Zombie shows that novel data indexing methods can greatly reduce the time needed to evaluate the effectiveness of feature engineering, thereby reducing the time needed to train accurate machine learning models. With our system Tahoma, we show that by using particular physical representations of the images used as input into convolutional neural network classifier cascades, content can be quickly extracted to support binary predicates used in a video analytics database. And our system Grover demonstrates that universal embeddings, like those used in computer vision or natural language processing, can be created for relational data, with both column and table embeddings used to improve the performance of data integration tasks. Our work shows machine learning and data management go hand-in-hand, and taking a holistic view of both can lead to improvements in each field. | |
dc.language.iso | en_US | |
dc.subject | database | |
dc.subject | machine learning | |
dc.subject | deep learning | |
dc.subject | data management | |
dc.subject | feature engineering | |
dc.subject | image recognition | |
dc.title | It's Data All the Way Down: Exploring the Relationship Between Machine Learning and Data Management | |
dc.type | Thesis | |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Computer Science & Engineering | |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | |
dc.contributor.committeemember | Cafarella, Michael John | |
dc.contributor.committeemember | Collins-Thompson, Kevyn | |
dc.contributor.committeemember | Jagadish, Hosagrahar V | |
dc.contributor.committeemember | Wenisch, Thomas F | |
dc.subject.hlbsecondlevel | Computer Science | |
dc.subject.hlbtoplevel | Engineering | |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/153458/1/mrander_1.pdf | |
dc.identifier.orcid | 0000-0002-0959-4234 | |
dc.identifier.name-orcid | Anderson, Michael; 0000-0002-0959-4234 | en_US |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.