Design Techniques for Energy-Efficient and Scalable Machine Learning Accelerators

Chou, Teyuh

Design Techniques for Energy-Efficient and Scalable Machine Learning Accelerators

Chou, Teyuh

2022

View/Open

teyuh_1.pdf

(15MB

PDF)

Abstract

Machine learning is a key application driver of new computing hardware. Designing high-performance machine learning hardware requires a large number of operations and a high memory bandwidth. The energy efficiency of the hardware is often limited by the data movement and the memory access bottleneck. As the machine learning models evolve to become even larger and more complex over time, it is also a constant challenge to meet the computational requirements of these new models. In this work, we investigate processing in memory (PIM) approaches to overcome the memory access bottleneck and a chiplet-based integration approach to efficiently scale up machine learning hardware by reusing chiplets. DNN model size and complexity growths have already outpaced the DNN chip upgrades. Making monolithic chips to keep up with the model evaluations is challenging. We demonstrate a chiplet-based approach to designing DNN hardware. The proposed chiplet is called NetFlex -- a modular design that can be connected together to build larger DNN hardware. NetFlex is designed to support various layers, including convolutional layer, deconvolutional layer, and fully connected layer, and multiple configurations. The deconvolution dataflow is optimized by removing computation of both row-wise and element-wise 0s. In the PE arrays, spatial processing with three dimensional parallelism is implemented for data reuse and temporal processing is chosen to adapt to different kernel sizes. The processing scheduling and memory mapping are designed for streaming activations without extra data rearrangement. The chiplets are connected to form a ring topology and can be gated by the skipping module to reduce the computation and memory accesses of simple scenes for perception. An Advanced Interface Bus (AIB) and an Advanced eXtensuble Interface (AXI)-compatible protocol enable data streaming from one chiplet to another chiplet. The chiplets are integrated on the interposer using a 2.5D fan-out wafer level packaging (FOWLP) technology. PIM approach has gained significant attention due to its potential of high energy efficiency for DNN workloads. However, key challenges remain: the overhead of high-resolution ADCs and degraded sensing margin when a large number of bitcells are activated together. We propose adaptive-range PIM (AR-PIM) to take advantage of sparsity to relax the need for high-resolution ADCs and improve the sensing margin. PIM is a concept to enable massively parallel dot products while keeping one set of operands in memory. PIM is ideal for computationally demanding deep neural networks (DNNs) and recurrent neural networks (RNNs). Processing in resistive RAM (RRAM) is particularly appealing due to RRAM's high density and low energy. A key limitation of PIM is the cost of multi-bit analog-to-digital (A/D) conversions that can defeat the efficiency and performance benefits of PIM. We demonstrate the CASCADE architecture that connects multiply-accumulate (MAC) RRAM arrays with buffer RRAM arrays to extend the processing in analog and in memory: dot products are followed by partial-sum buffering and accumulation to implement a complete DNN or RNN layer. Design choices are made and the interface is designed to enable a variation-tolerant, robust analog dataflow. A new memory mapping scheme named R-Mapping is devised to enable the in-RRAM accumulation of partial sums; and an analog summation scheme is used to reduce the number of A/D conversions required to obtain the final sum. CASCADE is compared with recent in-RRAM computation architectures using state-of-the-art DNN and RNN benchmarks.

Deep Blue DOI

https://dx.doi.org/10.7302/4761

Subjects

Machine learning accelerators

Types

Thesis

Handle

https://hdl.handle.net/2027.42/172732

Metadata

Show full item record

Collections

Dissertations and Theses (Ph.D. and Master's)

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.