Memory and System Aware Architectures for Real-Time Machine Learning

Pinkham, Reid

Memory and System Aware Architectures for Real-Time Machine Learning

dc.contributor.author	Pinkham, Reid
dc.date.accessioned	2022-01-19T15:31:29Z
dc.date.available	2024-01-01
dc.date.available	2022-01-19T15:31:29Z
dc.date.issued	2021
dc.date.submitted	2021
dc.identifier.uri	https://hdl.handle.net/2027.42/171438
dc.description.abstract	There has been an explosion of growth in the field of Machine Learning (ML) enabled by the widespread availability of continually faster computing hardware. These new ML algorithms are increasingly used in real-time applications which introduces new challenges for the computational hardware. The real-time requirement requires moving the computation closer to the data source which places more importance on the efficiency and latency of computation. This thesis is comprised of four parts which introduce a mix of algorithmic and hardware advancements to enable real-time ML computation. The first part focuses on the task of performing a k-Nearest-Neighbor (kNN) search on un-ordered point clouds for autonomous vehicle applications. I present QuickNN, an FPGA based accelerator which can handle real-time kNN point cloud processing. QuickNN uses strategically placed caches and ordered external memory placement to alleviate the limited external memory bandwidth. It also introduces an efficient tree storage and accompanying traversal method to reduce the bottleneck of tree manipulation. The second part introduces a lightweight CNN-based compression algorithm which can be used on high-frame-rate streamed video data. This work aims to address the challenge of compressing video data quickly with relatively low overhead using a Convolutional Neural Network (CNN). Compared to previous CNN-based compression schemes, the presented method has comparable compression complexity to state of the art traditional compression schemes, but has the advantage of a near-zero overhead in decompression. The third part presents an in-depth design space exploration of the multi-processor Augmented and Virtual Reality (AR/VR) device. This work introduces an example AR/VR platform and performs analysis of the trade offs associated with splitting CNN computation between the multiple processors on the device, including the small on-sensor processors. Using these insights, some straightforward design rules are presented and shown to yield nearly optimal processor specification and algorithm mapping. Finally, two real-world processor limitations are discussed and how they impact the algorithm mapping and most suitable types of processors. The fourth part presents a design for a near-sensor CNN processing architecture which is adept to a dynamically varying workload. The presented architecture is intended for near-sensor compute comprised of a scalable processor and stacked high density non-volatile memories (NVM) which store the CNN weights and can be power gated at run time to save energy. The processing architecture consists of multiple connected tiles, each with multiple vector-matrix multiplier (VMM) units. Through supporting multiple mapping methods, dataflow schemes, and fine-grained power gating, the processing architecture can efficiently adapt to a wide range of real-time workloads. We demonstrate that the same architecture can be scaled in size to fit the design envelope of the system while maintaining efficiency, as well as quantify the impact of individual architecture improvements over a standard SIMD-based design. Together, these four works tie together aspects of real-time system design which are important for a diverse set of future applications. As the applications of ML algorithms continue to expand, so must the supporting compute architectures. Advancement of real-time architectures will enable the next wave of computing platforms, from autonomous vehicles to wearable AR devices, which will continue to lead to a safer and more connected world.
dc.language.iso	en_US
dc.subject	Machine Learning
dc.subject	Computer Architecture
dc.subject	Real-time computation
dc.subject	augmented and virtual reality ar/vr
dc.title	Memory and System Aware Architectures for Real-Time Machine Learning
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Electrical and Computer Engineering
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Zhang, Zhengya
dc.contributor.committeemember	Hayes, John Patrick
dc.contributor.committeemember	Kim, Hun Seok
dc.contributor.committeemember	Sylvester, Dennis Michael
dc.subject.hlbsecondlevel	Computer Science
dc.subject.hlbtoplevel	Engineering
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/171438/1/pinkhamr_1.pdf	en
dc.identifier.doi	https://dx.doi.org/10.7302/3950
dc.identifier.orcid	0000-0001-6869-0261
dc.identifier.name-orcid	Pinkham, Reid; 0000-0001-6869-0261	en_US
dc.restrict.um	YES
dc.working.doi	10.7302/3950	en
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: pinkhamr_1.pdf
Size:: 8.430MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.