Memory and System Aware Architectures for Real-Time Machine Learning
dc.contributor.author | Pinkham, Reid | |
dc.date.accessioned | 2022-01-19T15:31:29Z | |
dc.date.available | 2024-01-01 | |
dc.date.available | 2022-01-19T15:31:29Z | |
dc.date.issued | 2021 | |
dc.date.submitted | 2021 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/171438 | |
dc.description.abstract | There has been an explosion of growth in the field of Machine Learning (ML) enabled by the widespread availability of continually faster computing hardware. These new ML algorithms are increasingly used in real-time applications which introduces new challenges for the computational hardware. The real-time requirement requires moving the computation closer to the data source which places more importance on the efficiency and latency of computation. This thesis is comprised of four parts which introduce a mix of algorithmic and hardware advancements to enable real-time ML computation. The first part focuses on the task of performing a k-Nearest-Neighbor (kNN) search on un-ordered point clouds for autonomous vehicle applications. I present QuickNN, an FPGA based accelerator which can handle real-time kNN point cloud processing. QuickNN uses strategically placed caches and ordered external memory placement to alleviate the limited external memory bandwidth. It also introduces an efficient tree storage and accompanying traversal method to reduce the bottleneck of tree manipulation. The second part introduces a lightweight CNN-based compression algorithm which can be used on high-frame-rate streamed video data. This work aims to address the challenge of compressing video data quickly with relatively low overhead using a Convolutional Neural Network (CNN). Compared to previous CNN-based compression schemes, the presented method has comparable compression complexity to state of the art traditional compression schemes, but has the advantage of a near-zero overhead in decompression. The third part presents an in-depth design space exploration of the multi-processor Augmented and Virtual Reality (AR/VR) device. This work introduces an example AR/VR platform and performs analysis of the trade offs associated with splitting CNN computation between the multiple processors on the device, including the small on-sensor processors. Using these insights, some straightforward design rules are presented and shown to yield nearly optimal processor specification and algorithm mapping. Finally, two real-world processor limitations are discussed and how they impact the algorithm mapping and most suitable types of processors. The fourth part presents a design for a near-sensor CNN processing architecture which is adept to a dynamically varying workload. The presented architecture is intended for near-sensor compute comprised of a scalable processor and stacked high density non-volatile memories (NVM) which store the CNN weights and can be power gated at run time to save energy. The processing architecture consists of multiple connected tiles, each with multiple vector-matrix multiplier (VMM) units. Through supporting multiple mapping methods, dataflow schemes, and fine-grained power gating, the processing architecture can efficiently adapt to a wide range of real-time workloads. We demonstrate that the same architecture can be scaled in size to fit the design envelope of the system while maintaining efficiency, as well as quantify the impact of individual architecture improvements over a standard SIMD-based design. Together, these four works tie together aspects of real-time system design which are important for a diverse set of future applications. As the applications of ML algorithms continue to expand, so must the supporting compute architectures. Advancement of real-time architectures will enable the next wave of computing platforms, from autonomous vehicles to wearable AR devices, which will continue to lead to a safer and more connected world. | |
dc.language.iso | en_US | |
dc.subject | Machine Learning | |
dc.subject | Computer Architecture | |
dc.subject | Real-time computation | |
dc.subject | augmented and virtual reality ar/vr | |
dc.title | Memory and System Aware Architectures for Real-Time Machine Learning | |
dc.type | Thesis | |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Electrical and Computer Engineering | |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | |
dc.contributor.committeemember | Zhang, Zhengya | |
dc.contributor.committeemember | Hayes, John Patrick | |
dc.contributor.committeemember | Kim, Hun Seok | |
dc.contributor.committeemember | Sylvester, Dennis Michael | |
dc.subject.hlbsecondlevel | Computer Science | |
dc.subject.hlbtoplevel | Engineering | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/171438/1/pinkhamr_1.pdf | en |
dc.identifier.doi | https://dx.doi.org/10.7302/3950 | |
dc.identifier.orcid | 0000-0001-6869-0261 | |
dc.identifier.name-orcid | Pinkham, Reid; 0000-0001-6869-0261 | en_US |
dc.restrict.um | YES | |
dc.working.doi | 10.7302/3950 | en |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.