Algorithm-Hardware Co-design For Efficiency Optimization Of Machine Learning Workloads
Pan, Yunjie
2025
Abstract
Machine learning (ML) is now a fundamental technology, powering advancements in AI chatbots, autonomous vehicles, and recommendation systems. ML model development follows a structured work-flow known as the Machine Learning Pipeline, which includes Data Preprocessing, Model Training, and Model Deployment. As computational demands of ML workloads continue to grow, optimizing performance and efficiency has become a critical challenge. Traditional approaches on current hardware struggle to meet the increasing requirements for scalability, power efficiency, and processing speed, especially with Convolutional Neural Networks (CNNs) deployment on resource-constrained devices, Large Language Models (LLMs) training, and graph mining in large-scale graphs. To address the challenges in the ML pipeline, this thesis explores approximation techniques that balance accuracy and efficiency to close the efficiency gap. For the model deployment stage, specifically CNN inference, this thesis presents BitSET, a software-hardware co-design approach that employs a prediction-based bit-level early termination technique to reduce energy consumption with minimal accuracy degradation. By leveraging the runtime characteristics of CNNs—specifically, that more than half of the output values after the ReLU layer are zero, this work introduces a low-overhead algorithm to predict zero output values and skip redundant computations early. A novel weight encoding scheme and a bit-serial CNN accelerator further enhance efficiency, achieving a 1.6x speedup and 1.4x energy efficiency improvement when allowing 1% accuracy drop. For the model training stage, specifically LLM pretraining, this thesis introduces SNIP, a fine-grained mixed-precision training framework designed to optimize efficiency while preserving model quality. Existing mixed-precision training frameworks typically apply uniform precision settings across all General Matrix Multiplication (GEMM) operations, ignoring the fact that different layers have varying sensitivity to precision loss. SNIP overcomes this limitation by dynamically selecting layer-wise precision to minimize quality loss while achieving efficiency targets. To accurately quantify the impact of quantization on model quality, SNIP introduces two key metrics: forward loss divergence and backward weight divergence. It then formulates precision selection as an Integer Linear Programming (ILP) optimization problem, enabling an optimal balance between efficiency and accuracy. SNIP significantly outperforms existing quantization baselines, achieving subbyte precision training while maintaining accuracy close to high-precision models. During the data preprocessing stage, graph mining extracts high-level topological information from large graphs, enabling more structured representations for downstream tasks. This thesis addresses the challenge of efficiently counting small temporal motifs in massive graphs with billions of timestamped edges, where the combinatorial explosion of motif enumeration poses a major computational bottleneck. To address this, the thesis introduces TEACUPS and TIMEST, a sampling-based temporal motif counting algorithm that estimates motif counts with high accuracy while significantly reducing search space. TEACUPS leverages a 3-path sampling method for 4-vertex motifs. TIMEST extends TEACUPs by a spanning tree sampler, relaxed constraints, and a sliding window technique to efficiently construct sampling weights and perform fast, scalable motif estimation in large temporal graphs. These methods reduce the runtime from weeks to minutes while maintaining an error rate of around 5%, making motif counting feasible for real-world applications. All these projects leverage approximate techniques to balance accuracy and efficiency, addressing the efficiency gap between current hardware capabilities and the growing computational demands of ML workloads.Deep Blue DOI
Subjects
Approximate Computing Software-Hardware Co-Design Machine Learning Efficiency Optimization Graph Mining
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.