Acceleration Techniques of Sparse Linear Algebra on Emerging Architectures
Feng, Siying
2022
Abstract
Recent years have witnessed a tremendous surge of interest in accelerating sparse linear algebra applications. Sparse linear algebra is a fundamental building block and usually the performance bottleneck of a wide range of applications, such as machine learning, graph processing, and scientific computing. Optimizing sparse linear algebra kernels is thus critical for the efficient computation of these workloads. The key challenge of sparse linear algebra lies in the irregular access pattern induced by the sparseness nature, which renders the deep cache hierarchy in General-Purpose Processors (GPPs) useless and makes sparse linear algebra applications notoriously memory intensive. This dissertation proposes multiple approaches to optimize the performance and efficiency of sparse linear algebra kernels by taking advantage of emerging architecture techniques, including hardware specialization, architecture reconfiguration, and Near-Memory Processing (NMP). Aiming for a balance among efficiency, flexibility, and programmability for architecture designs, this dissertation first proposes Transmuter, a reconfigurable architecture that features massively-parallel Processing Elements (PEs) and a flexible on-chip memory hierarchy that reconfigures the memory type, resource sharing, and dataflow at runtime to adapt to different applications. Transmuter demonstrates significant efficiency gains over the CPU and GPU across a diverse set of commonly-used kernels while offering GPU-like programmability. More importantly, Transmuter retains high performance for sparse linear algebra kernels, achieving an energy efficiency within 4.1x compared to state-of-the-art functionally-equivalent Application Specific Integrated Circuits (ASICs). As the algorithm mapping and hardware configuration play a crucial role in the performance of Transmuter, the next piece of this dissertation proposes the CoSPARSE framework, which guides the runtime reconfiguration of Transmuter to achieve the best performance for graph analytics workloads. During execution, CoSPARSE intelligently reconfigures to the best-performing software algorithm and hardware configuration for Transmuter based on the input characteristics. The synergistic software and hardware reconfiguration amass a net speedup of up to 2.0x, over a naive baseline implementation with no software or hardware reconfiguration. Compared to a recent graph processing framework on a server-class CPU, CoSPARSE achieves an average speedup and energy efficiency improvement of 1.5x and 404.4x, respectively, across a suite of widely-used graph algorithms. The dynamic algorithm reconfiguration of CoSPARSE and many other graph frameworks requires the input graph to be stored in multiple data formats to avoid runtime transposition, trading off storage for performance. As data sizes keep growing, to prevent designs like CoSPARSE from expensive disk accesses when memory storage is limited, the final part of this dissertation presents MeNDA, a scalable near-memory accelerator for sparse matrix transposition and sparse merging dataflows. The wide application of multi-way merge sorting allows MeNDA to be easily adapted to other sparse primitives such as Sparse Matrix-Vector Multiplication (SpMV). Compared to two state-of-the-art implementations of sparse matrix transposition on a CPU and a sparse library on a GPU, MeNDA achieves a speedup of 19.1x, 12.0x, and 7.7x, respectively. Because MeNDA greatly reduces the runtime transposition overhead, integrating MeNDA can save reconfigurable graph frameworks such as CoSPARSE from storing two or more copies of the graph in the main memory with a minor power overhead of 78.6 mW per rank for commodity DRAM devices.Deep Blue DOI
Subjects
Computer Architecture for Sparse Linear Algebra Applications Coarse Granularity Reconfigurable Architecture Intelligent Runtime Architecture Reconfiguration Near-memory Processing Graph Analytics Sparse Matrix Transposition and Sparse Multi-way Merging Sorting
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.