Show simple item record

In-Memory Acceleration for General Data Parallel Applications

dc.contributor.authorFujiki, Daichi
dc.date.accessioned2022-05-25T15:17:33Z
dc.date.available2022-05-25T15:17:33Z
dc.date.issued2022
dc.date.submitted2022
dc.identifier.urihttps://hdl.handle.net/2027.42/172509
dc.description.abstractGeneral purpose processors and accelerators including system-on-a-chip and graphics processing units are composed of three principal components: processor, memory, and interconnection of these two. This simple but powerful architecture model has been the basis of computer architecture for decades. However, the recent data-intensive trend in computation workloads has observed bottlenecks in this fundamental paradigm of computers. Studies show that data communication takes 1,000x time and 40x power compared to arithmetic performed in the processors. Processing-in-Memory (PIM) has long been an attractive idea that has the potential to break the well known memory wall problem. PIM moves compute logic near the memory, and thereby reduces data movement. In contrast, certain memories have been shown that they can morph themselves into compute units by exploiting the physical properties of the memory cells, making them intrinsically more efficient than PIM. Modern computing systems devote a large portion (more than 90%) of aggregate die area for passive memories; thus, re-purposing them for active computing units brings substantial benefits. However, prior work has only provided low-level interfaces for computation or relied on a manual mapping of machine learning kernels to the compute-capable memories. The main goal of this dissertation is to extend the compute capability of memory arrays and make them applicable to a wide range of data-parallel applications. First, a processor architecture is proposed that re-purposes resistive memory to support data-parallel in-memory computation. The proposed execution model seeks to expose the available parallelism in a memory array by supporting a programming model that merges the concepts of data-flow and vector processing. This is empowered by a compiler that transforms Data Flow Graphs of tensor programs to a set of data-parallel code modules with memory ISA. Second, this dissertation presents Duality Cache architecture that flexibly transforms caches on demand into an in-memory accelerator that can execute arbitrary data-parallel programs. The proposed architecture adopts the SIMT execution model and uses CUDA/OpenACC framework as the programming frontend. We develop a backend compiler that compiles PTX, the intermediate representation for CUDA, for the proposed architecture. Finally, this dissertation presents a multi-layer in-memory computing framework. In-memory computing can be implemented across multiple layers of the memory hierarchy, and in such a system figuring out the right place to compute is an important question to be answered. We propose a framework that determines the appropriate level of memory hierarchy for in-memory computing and maximizes resource utilization. We compare the performance and energy efficiency of our in-memory accelerators with server class CPU and GPU using a variety of data-parallel applications. Our experimental results show that in-ReRAM computing achieves 7.5x average speedup for PARSEC applications and in-SRAM computing achieves 3.6x average speedup for Rodinia applications. Multi-layer in-memory computing can provide an overall speedup of 4.8x for Graph Neural Networks applications with a significant workload dynamism. Our multi-faceted approaches, mainly composed of enhanced arithmetic operations, parallel programming models with compilers, and parallel execution models, unlock massive compute capabilities and energy efficiency of in-memory computing for general data-parallel applications.
dc.language.isoen_US
dc.subjectIn-memory computing
dc.subjectProcessing in memory
dc.subjectData parallel computing
dc.subjectComputer architecture
dc.titleIn-Memory Acceleration for General Data Parallel Applications
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineComputer Science & Engineering
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberDas, Reetuparna
dc.contributor.committeememberSylvester, Dennis Michael
dc.contributor.committeememberMahlke, Scott
dc.contributor.committeememberMudge, Trevor N
dc.subject.hlbsecondlevelComputer Science
dc.subject.hlbtoplevelEngineering
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/172509/1/dfujiki_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/4538
dc.identifier.orcid0000-0001-7949-0417
dc.identifier.name-orcidFujiki, Daichi; 0000-0001-7949-0417en_US
dc.working.doi10.7302/4538en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.