Efficient, Reconfigurable, and QoS-Aware Systems for Deep Neural Networks
dc.contributor.author | Wang, Xiaowei | |
dc.date.accessioned | 2023-01-30T16:12:04Z | |
dc.date.available | 2023-01-30T16:12:04Z | |
dc.date.issued | 2022 | |
dc.date.submitted | 2022 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/175658 | |
dc.description.abstract | Deep Neural Networks (DNN) are an important machine learning application which has high compute and memory bandwidth requirements to the underlying computer systems. Prior works have proposed domain specific accelerators for DNNs, and in-memory computing architectures are especially promising as they can provide high memory bandwidth and computing throughput at the same time. While the above in-cache computing system is highly efficient, its hardware architecture lacks the flexibility to be optimally reconfigured for different DNN models. The second part of the dissertation proposes a reconfigurable in-SRAM computing DNN accelerator based on block RAMs (BRAM) on FPGAs. We propose circuit changes to the BRAM to enable bit-serial in-memory computing, which turns BRAMs as both bit-serial vector units and data storage. Building on the compute-capable BRAMs, we further propose customized accelerator instances for different DNN models, which outperforms a state-of-the-art DNN accelerator on FPGA. DNN workloads can also be run on general purpose CPUs in datacenters. Cache compression is a technique to reduce the cache miss rate on CPU, which benefits DNNs as well as many other applications. In the third part of the dissertation, we present a novel method to compress cache data with efficient in-SRAM data comparison. Further, as datacenters frequently collocate multiple workloads to increase server utilization, the Quality-of-Service (QoS) of DNN workloads, such as latency, can be affected. The final part of the dissertation proposes a systematic approach to achieve the QoS of DNNs under collocation, with resource partition to reduce interference, and a proposed latency prediction model to choose the partition that satisfies the QoS requirement. | |
dc.language.iso | en_US | |
dc.subject | in-memory computing | |
dc.subject | deep neural network | |
dc.subject | cache | |
dc.title | Efficient, Reconfigurable, and QoS-Aware Systems for Deep Neural Networks | |
dc.type | Thesis | |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Computer Science & Engineering | |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | |
dc.contributor.committeemember | Das, Reetuparna | |
dc.contributor.committeemember | Zhang, Zhengya | |
dc.contributor.committeemember | Mahlke, Scott | |
dc.contributor.committeemember | Tang, Lingjia | |
dc.subject.hlbsecondlevel | Computer Science | |
dc.subject.hlbtoplevel | Engineering | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/175658/1/xiaoweiw_1.pdf | |
dc.identifier.doi | https://dx.doi.org/10.7302/6872 | |
dc.identifier.orcid | 0000-0002-5883-7327 | |
dc.identifier.name-orcid | Wang, Xiaowei; 0000-0002-5883-7327 | en_US |
dc.working.doi | 10.7302/6872 | en |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.