Efficient Utilization of Heterogeneous Compute and Memory Systems
Kassa, Hiwot Tadese
2022
Abstract
Conventional compute and memory systems scaling to achieve higher performance and lower cost and power have diminished. Concurrently, we have diverse compute and memory-demanding workloads that continue to grow and stress traditional systems with only CPUs and DRAM. Heterogeneous compute and memory systems establish the opportunity to boost performance for these demanding workloads by providing hardware units with specialized characteristics. Specialized compute platforms such as GPUs, FPGAs, and accelerators execute specific tasks faster than CPUs, increasing performance and energy efficiency for the particular task. Heterogeneity in the memory systems, such as incorporating various memory technologies like storage class memories (SCMs) alongside DRAM, allows for denser, low-power, and low-cost memories to accommodate data-intensive applications. However, heterogeneous systems have unique characteristics compared to traditional systems. We must carefully design how workloads utilize these units to harness their full benefits. This dissertation presents software and hardware techniques that maximize the performance, energy, and cost-efficiency of heterogeneous systems based on the compute and memory access patterns of various application domains. First, this thesis proposes ChipAdvisor, a machine learning-based framework, to identify the best platform for an application in the early steps of systems design. ChipAdvisor considers the intrinsic characteristics of applications such as parallelism, locality, and synchronization patterns and archives 98% and 94% accuracy in predicting the best performant and energy-efficient platform, respectively, for diverse workloads when considering a system with CPU, GPU, and FPGA. Second, we propose a heterogeneous memory-enabled system design with DRAM and storage class memory (SCM) for key-value stores, one of the largest workloads in data centers. We characterize an extensive deployment of key value stores in a commercial data center and design optimal server configurations with heterogeneous memories. We achieve an 80% performance increase compared to a single-socket platform while reducing the total cost of ownership (TCO) by 43-48% compared to a two-socket platform. Third, this dissertation designs MTrainS, an end-to-end recommendation system trainer that utilizes heterogeneous compute and memory systems. MTrainSefficiently divides recommendation model training tasks between CPUs and GPUs based on the compute patterns. It then hierarchically utilizes various memory types, such as HBM, DRAM, and SCMs, by studying the temporal locality and bandwidth requirements of recommendation system models in data centers. MTrainS reduces the number of hosts used for training by up to 8×, decreasing the power and cost of training. Lastly, this dissertation proposes CoACT, which designs fine-grain cache and memory sharing for collaborative workloads running in integrated CPU-GPU systems. CoACT uses the collaborative pattern of applications to fine-tune cache partitioning and interconnect and memory controller utilization for CPU and GPU, improving performance by 25%.Deep Blue DOI
Subjects
Heterogenous systems, memory subsystem, GPU, FPGA, storage class memories, data center, key value stores, recommendation systems
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.