Compression and Curriculum Strategies for Efficient Learning in Deep Neural Networks

Ravi Ganesh, Madan

Compression and Curriculum Strategies for Efficient Learning in Deep Neural Networks

Ravi Ganesh, Madan

2022

View/Open

madantrg_1.pdf

(5.5MB

PDF)

Abstract

The lifecycle of a deep learning application consists of five phases: Data collection, Architecture prototyping, Training, Analysis, and Deployment. There is a significant cost—both human and computational—in all phases of this life cycle. Given the increasing dominance of deep learning across industry and commerce, reducing these costs while maintaining high performance would have a significant impact. To that end, this work focuses on Architecture prototyping and Training, and proposes new techniques that improve their efficiency, by reducing the number of Floating Point Operations (FLOPs), and performance. Prototyping deep neural networks (DNNs) for hardware-constrained environments is done either manually, through architecture search, or pruning. Manual and architecture search algorithms require long processing times and large-scale resources to obtain optimal solutions, which limit their usability. While pruning algorithms operate more efficiently than previous approaches, they are not effective at modeling the uncertainty in information flow between layers and their downstream impact when pruning. In Chapters 3 and 4, we propose a single-shot model pruning approach that uses a probabilistic framework to model the uncertainty and decrease the redundancy in information passed between layers. Within our framework, we use conditional mutual information (CMI) to measure the strength of contributions between filters in adjacent layers. In addition, we incorporate information from the weight matrices to balance the contributions from CMI, computed from the activations. Further, we tackle the practical challenges built into pruning pipelines like, the time complexity to determine the upper pruning limit or sensitivity for each layer of the DNN. Our main takeaway is a state-of-the-art single-shot model pruning pipeline, which has a performance of 72.60% on ResNet50-ILSVRC2012 with a sparsity of 68.93%. Overall, our pruning approach reduces the number of FLOPs computed during inference by 51.52%. The second phase we focus on is Training, which scales its time and resource consumption based on factors like dataset, epochs, and many others. Several algorithms like low-precision computations and distributed training focus on making training more efficient. However, they require either a large number of computational resources or rely on approximations that do not fully match the performance of their original counterparts. Instead, we follow the curriculum learning paradigm, which regulates the DNN-Dataset interaction from the data side to improve performance while simultaneously affecting computational load and other properties of training. In Chapters 5 and 6, our primary focus is obtaining high-performing solutions with minimal modifications. Then, we expand our goals to include improved efficiency and adversarial robustness—important traits for real-world deployment. To concurrently tackle such interconnected goals, we introduce a feature-based curriculum in Chapter 6 that uses the difference in activation values, between the original and noise-perturbed inputs, to identify and remove samples susceptible to attacks. By comparing our curriculum against standard and adversarial training regimes we highlight how our curriculum improves performance in both categories. Overall, our curriculum-based approach to Training reduces 63.8 TFLOPs. By proposing techniques that target the prototyping and training phases of the DNN lifecycle, we reduce the number of computations performed, and thereby the burden imposed by their repeated use when developing DNN-based solutions. By imposing multiple constraints during their development and training, we enable shorter and more resource-friendly development of DNNs; we also ensure the addition of robustness using an orthogonal perspective to traditional adversarial training that doesn’t compromise the performance of DNNs.

Deep Blue DOI

https://dx.doi.org/10.7302/6073

Subjects

Efficient Learning

Neural Network Pruning

Curriculum Learning

Types

Thesis

Handle

https://hdl.handle.net/2027.42/174342

Metadata

Show full item record

Collections

Dissertations and Theses (Ph.D. and Master's)

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.