Efficient, Reconfigurable, and QoS-Aware Systems for Deep Neural Networks

Wang, Xiaowei

Efficient, Reconfigurable, and QoS-Aware Systems for Deep Neural Networks

dc.contributor.author	Wang, Xiaowei
dc.date.accessioned	2023-01-30T16:12:04Z
dc.date.available	2023-01-30T16:12:04Z
dc.date.issued	2022
dc.date.submitted	2022
dc.identifier.uri	https://hdl.handle.net/2027.42/175658
dc.description.abstract	Deep Neural Networks (DNN) are an important machine learning application which has high compute and memory bandwidth requirements to the underlying computer systems. Prior works have proposed domain specific accelerators for DNNs, and in-memory computing architectures are especially promising as they can provide high memory bandwidth and computing throughput at the same time. While the above in-cache computing system is highly efficient, its hardware architecture lacks the flexibility to be optimally reconfigured for different DNN models. The second part of the dissertation proposes a reconfigurable in-SRAM computing DNN accelerator based on block RAMs (BRAM) on FPGAs. We propose circuit changes to the BRAM to enable bit-serial in-memory computing, which turns BRAMs as both bit-serial vector units and data storage. Building on the compute-capable BRAMs, we further propose customized accelerator instances for different DNN models, which outperforms a state-of-the-art DNN accelerator on FPGA. DNN workloads can also be run on general purpose CPUs in datacenters. Cache compression is a technique to reduce the cache miss rate on CPU, which benefits DNNs as well as many other applications. In the third part of the dissertation, we present a novel method to compress cache data with efficient in-SRAM data comparison. Further, as datacenters frequently collocate multiple workloads to increase server utilization, the Quality-of-Service (QoS) of DNN workloads, such as latency, can be affected. The final part of the dissertation proposes a systematic approach to achieve the QoS of DNNs under collocation, with resource partition to reduce interference, and a proposed latency prediction model to choose the partition that satisfies the QoS requirement.
dc.language.iso	en_US
dc.subject	in-memory computing
dc.subject	deep neural network
dc.subject	cache
dc.title	Efficient, Reconfigurable, and QoS-Aware Systems for Deep Neural Networks
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Computer Science & Engineering
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Das, Reetuparna
dc.contributor.committeemember	Zhang, Zhengya
dc.contributor.committeemember	Mahlke, Scott
dc.contributor.committeemember	Tang, Lingjia
dc.subject.hlbsecondlevel	Computer Science
dc.subject.hlbtoplevel	Engineering
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/175658/1/xiaoweiw_1.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/6872
dc.identifier.orcid	0000-0002-5883-7327
dc.identifier.name-orcid	Wang, Xiaowei; 0000-0002-5883-7327	en_US
dc.working.doi	10.7302/6872	en
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: xiaoweiw_1.pdf
Size:: 5.676MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.