Efficient Deep Learning Accelerator Architectures by Model Compression and Data Orchestration

Zhang, Jie-Fang

Efficient Deep Learning Accelerator Architectures by Model Compression and Data Orchestration

dc.contributor.author	Zhang, Jie-Fang
dc.date.accessioned	2022-05-25T15:43:49Z
dc.date.available	2024-05-01
dc.date.available	2022-05-25T15:43:49Z
dc.date.issued	2022
dc.date.submitted	2022
dc.identifier.uri	https://hdl.handle.net/2027.42/172770
dc.description.abstract	Deep neural networks (DNNs) have become the primary methods to solve machine learning and artificial intelligence problems in the fields of computer vision, natural language processing, and robotics. The advancements in DNN model development are to a large degree attributed to the increase of model size, complexity, and versatility. The continuous growth of model size, complexity, and versatility causes intense memory storage and compute requirements, and complicates the hardware design, especially for the more resource-constrained mobile and smart sensor platforms. To resolve the resource bottlenecks, model compression techniques, i.e., data quantization, network sparsification, and tensor decomposition, have been used to reduce the model size while preserving the accuracy of the original model. However, they introduce several computation challenges including 1) irregular computation in an unstructured sparse neural network (NN) from network sparsification, and 2) complex and arbitrary tensor orchestration for tensor contraction in a tensorized NN. Meanwhile, DNN’s capability has been transferred to new domains and applications to handle drastically different modalities and non-Euclidean data, e.g., point clouds and graphs. New computation challenges continue to emerge, for example, irregular memory access for graph-structured data in a graph-based point-cloud NN. These challenges lead to a low processing efficiency for existing hardware architectures and motivate the exploration of specialized hardware mechanisms and accelerator architectures. This dissertation consists of three works that explore the design of efficient accelerator architectures to overcome the computation challenges by exploiting model compression characteristics and data orchestration techniques. The first work presents the sparse neural acceleration processor (SNAP) to process sparse NNs resulted from unstructured network pruning. SNAP uses parallel associative search to discover valid weight and input activation pairs for parallel computation. A two-level partial sum (psum) reduction dataflow is used to eliminate access contention at the output buffer and cut the psum writeback traffic. The SNAP chip is implemented and achieves a peak effectual efficiency of 21.55 TOPS/W for sparse workloads and 3.61 TOPS/W for pruned ResNet-50. The second work presents Point-X, a spatial-locality-aware architecture that exploits the spatial locality in point clouds for efficient graph-based point-cloud NN processing. A clustering method extracts fine-grained and coarse-grained spatial locality from the input point cloud to maximize intra-tile computational parallelism and minimize inter-tile data movement. A chain network-on-chip (NoC) further reduces the data traffic and achieves up to 3.2x speedup over a traditional mesh NoC. The Point-X prototype achieves a throughput of 1307.1 inference/s and an energy efficiency of 604.5 inference/J on the DGCNN workload. The third work presents TetriX, an architecture-mapping co-design for efficient and flexible tensorized NN inference. An optimal contraction sequence with minimized computation and memory size requirements is identified for inference. A hybrid mapping scheme is used to eliminate complex orchestration operations by alternating between inner and outer product operations. TetriX uses index translation and output gathering to support flexible orchestration operations efficiently. TetriX is the first work to support all existing tensor decomposition methods for tensorized NNs and demonstrates up to 3.9x performance improvement compared to the prior work for tensor-train workloads. Overall, these three works explore the computation of different network optimization techniques. They exploit the full potentials of model compression and novel operations, and convert them into hardware performance and efficiency. The architectures can also be used to further enhance and support the development of more effective network models.
dc.language.iso	en_US
dc.subject	Accelerator architecture
dc.subject	Hardware design
dc.subject	Deep neural network
dc.subject	Unstructured sparsity
dc.subject	Point-cloud recognition
dc.subject	Tensor decomposition
dc.title	Efficient Deep Learning Accelerator Architectures by Model Compression and Data Orchestration
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Electrical and Computer Engineering
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Zhang, Zhengya
dc.contributor.committeemember	Dreslinski Jr, Ronald
dc.contributor.committeemember	Kim, Hun Seok
dc.contributor.committeemember	Sylvester, Dennis Michael
dc.subject.hlbsecondlevel	Computer Science
dc.subject.hlbsecondlevel	Electrical Engineering
dc.subject.hlbtoplevel	Engineering
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/172770/1/jfzhang_1.pdf	en
dc.identifier.doi	https://dx.doi.org/10.7302/4799
dc.identifier.orcid	0000-0002-6609-4383
dc.identifier.name-orcid	Zhang, Jie-Fang; 0000-0002-6609-4383	en_US
dc.restrict.um	YES
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: jfzhang_1.pdf
Size:: 25.09MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.