Bridging Data and Hardware Gap for Efficient Machine Learning Model Scaling

Zheng, Haizhong

Bridging Data and Hardware Gap for Efficient Machine Learning Model Scaling

dc.contributor.author	Zheng, Haizhong
dc.date.accessioned	2025-01-06T18:19:07Z
dc.date.available	2025-01-06T18:19:07Z
dc.date.issued	2024
dc.date.submitted	2024
dc.identifier.uri	https://hdl.handle.net/2027.42/196111
dc.description.abstract	Recent research in deep learning models has achieved astonishing progress in various domains, like image classification, text generation, and image generation. With the exponential growth of model sizes and data volumes, AI models show a stronger ability than before and have been applied in many real-world scenarios like chatbots and autonomous driving, which marks a watershed moment in artificial intelligence. Despite our hope for further performance gains through scaling up model and dataset sizes, today’s large models are reaching scaling limits in two aspects: First, large model training is data-hungry. The cost of creating large volumes of high-quality human feedback data is prohibitively expensive, creating bottlenecks in scaling up large model training. Second, naive reliance on more powerful hardware is inadequate, as hardware improves at a much slower rate than what the growth in model size demands. Therefore, it becomes more important than ever to design more efficient algorithms and models for the purpose of data and inference efficiency. Fortunately, recent research shows that both datasets and models exhibit great redundancy, providing opportunities to optimize performance and reduce computational costs. This dissertation aims to bridge the gap between the rapid scaling of models and the slower scaling of high-quality data and hardware. For data efficiency, this dissertation designs several novel coreset selection and data condensation algorithms to select or synthesize a small but representative dataset for training to reduce redundancy in datasets. First, we show that coresets with better coverage of the underlying data distribution lead to better training performance. Based on this observation, we propose a coverage-centric coreset selection algorithm that significantly improves coreset selection performance. Second, we propose another coreset selection algorithm, ELFS, but focus on the label-free coreset selection scenario. Human labeling is one of the major bottlenecks for data collection. Given a limited human labeling budget, ELFS can identify a more representative subset for labeling. Besides coreset selection, we also explore other techniques to improve data efficiency. In our third data efficiency project, we explore how to improve data condensation performance. Instead of selecting a data subset, data condensation aims to synthesize a small synthetic dataset that captures the knowledge of a natural dataset. We propose a novel data container structure, HMN. HMN utilizes the hierarchical structure of the classification system, which stores information more efficiently. In our last data efficiency project, we propose a novel adversarial training algorithm that significantly reduces the overhead of the data augmentation phase of adversarial training. For inference efficiency, this dissertation majorly focuses on building hardware-friendly contextual sparse models that only activate necessary neurons for inference to reduce memory-bandwidth overhead. To achieve this goal, we propose LTE, an efficiency-aware training algorithm to train hardware-friendly contextual sparse models, which accelerates inference efficiency without sacrificing model performance. Finally, we conclude and discuss possible future research directions and opportunities to further enhance data and inference efficiency.
dc.language.iso	en_US
dc.subject	Efficient Machine Learning
dc.subject	Scaling Law
dc.subject	Data Efficiency
dc.title	Bridging Data and Hardware Gap for Efficient Machine Learning Model Scaling
dc.type	Thesis
dc.description.thesisdegreename	PhD
dc.description.thesisdegreediscipline	Computer Science & Engineering
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Prakash, Atul
dc.contributor.committeemember	Mei, Qiaozhu
dc.contributor.committeemember	Chai, Joyce
dc.contributor.committeemember	Chowdhury, Mosharaf
dc.subject.hlbsecondlevel	Computer Science
dc.subject.hlbtoplevel	Engineering
dc.contributor.affiliationumcampus	Ann Arbor
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/196111/1/hzzheng_1.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/25047
dc.identifier.orcid	0000-0003-0478-4028
dc.identifier.name-orcid	Zheng, Haizhong; 0000-0003-0478-4028	en_US
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: hzzheng_1.pdf
Size:: 18.34MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.