Show simple item record

Efficient Deep Learning Accelerator Architectures by Model Compression and Data Orchestration

dc.contributor.authorZhang, Jie-Fang
dc.date.accessioned2022-05-25T15:43:49Z
dc.date.available2024-05-01
dc.date.available2022-05-25T15:43:49Z
dc.date.issued2022
dc.date.submitted2022
dc.identifier.urihttps://hdl.handle.net/2027.42/172770
dc.description.abstractDeep neural networks (DNNs) have become the primary methods to solve machine learning and artificial intelligence problems in the fields of computer vision, natural language processing, and robotics. The advancements in DNN model development are to a large degree attributed to the increase of model size, complexity, and versatility. The continuous growth of model size, complexity, and versatility causes intense memory storage and compute requirements, and complicates the hardware design, especially for the more resource-constrained mobile and smart sensor platforms. To resolve the resource bottlenecks, model compression techniques, i.e., data quantization, network sparsification, and tensor decomposition, have been used to reduce the model size while preserving the accuracy of the original model. However, they introduce several computation challenges including 1) irregular computation in an unstructured sparse neural network (NN) from network sparsification, and 2) complex and arbitrary tensor orchestration for tensor contraction in a tensorized NN. Meanwhile, DNN’s capability has been transferred to new domains and applications to handle drastically different modalities and non-Euclidean data, e.g., point clouds and graphs. New computation challenges continue to emerge, for example, irregular memory access for graph-structured data in a graph-based point-cloud NN. These challenges lead to a low processing efficiency for existing hardware architectures and motivate the exploration of specialized hardware mechanisms and accelerator architectures. This dissertation consists of three works that explore the design of efficient accelerator architectures to overcome the computation challenges by exploiting model compression characteristics and data orchestration techniques. The first work presents the sparse neural acceleration processor (SNAP) to process sparse NNs resulted from unstructured network pruning. SNAP uses parallel associative search to discover valid weight and input activation pairs for parallel computation. A two-level partial sum (psum) reduction dataflow is used to eliminate access contention at the output buffer and cut the psum writeback traffic. The SNAP chip is implemented and achieves a peak effectual efficiency of 21.55 TOPS/W for sparse workloads and 3.61 TOPS/W for pruned ResNet-50. The second work presents Point-X, a spatial-locality-aware architecture that exploits the spatial locality in point clouds for efficient graph-based point-cloud NN processing. A clustering method extracts fine-grained and coarse-grained spatial locality from the input point cloud to maximize intra-tile computational parallelism and minimize inter-tile data movement. A chain network-on-chip (NoC) further reduces the data traffic and achieves up to 3.2x speedup over a traditional mesh NoC. The Point-X prototype achieves a throughput of 1307.1 inference/s and an energy efficiency of 604.5 inference/J on the DGCNN workload. The third work presents TetriX, an architecture-mapping co-design for efficient and flexible tensorized NN inference. An optimal contraction sequence with minimized computation and memory size requirements is identified for inference. A hybrid mapping scheme is used to eliminate complex orchestration operations by alternating between inner and outer product operations. TetriX uses index translation and output gathering to support flexible orchestration operations efficiently. TetriX is the first work to support all existing tensor decomposition methods for tensorized NNs and demonstrates up to 3.9x performance improvement compared to the prior work for tensor-train workloads. Overall, these three works explore the computation of different network optimization techniques. They exploit the full potentials of model compression and novel operations, and convert them into hardware performance and efficiency. The architectures can also be used to further enhance and support the development of more effective network models.
dc.language.isoen_US
dc.subjectAccelerator architecture
dc.subjectHardware design
dc.subjectDeep neural network
dc.subjectUnstructured sparsity
dc.subjectPoint-cloud recognition
dc.subjectTensor decomposition
dc.titleEfficient Deep Learning Accelerator Architectures by Model Compression and Data Orchestration
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineElectrical and Computer Engineering
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberZhang, Zhengya
dc.contributor.committeememberDreslinski Jr, Ronald
dc.contributor.committeememberKim, Hun Seok
dc.contributor.committeememberSylvester, Dennis Michael
dc.subject.hlbsecondlevelComputer Science
dc.subject.hlbsecondlevelElectrical Engineering
dc.subject.hlbtoplevelEngineering
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/172770/1/jfzhang_1.pdfen
dc.identifier.doihttps://dx.doi.org/10.7302/4799
dc.identifier.orcid0000-0002-6609-4383
dc.identifier.name-orcidZhang, Jie-Fang; 0000-0002-6609-4383en_US
dc.restrict.umYES
dc.working.doi10.7302/4799en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.