Accelerated Systems for Pattern Matching
dc.contributor.author | Subramaniyan, Arun | |
dc.date.accessioned | 2022-01-19T15:22:15Z | |
dc.date.available | 2022-01-19T15:22:15Z | |
dc.date.issued | 2021 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/171324 | |
dc.description.abstract | Pattern matching forms the core of many applications and contributes to a significant fraction of their execution time. For instance, scanning the human reference genome for motifs and identifying variations between individuals requires processing of 100s of gigabytes of unstructured data and can take several days on a multi-core processor. Parsing activities in the frontend of browsers can account for up to 40% of web page loading time. Datacenter log processing involves the analysis of data generated at the rate of several terabytes every few minutes. While general-purpose processors have been optimized for regular, data-parallel workloads, the class of pattern matching workloads identified above typically employ computational models and data-structures that are not well suited for general-purpose processing. In particular, these workloads perform irregular memory accesses and spend disproportionately more time and energy in moving data from storage to compute units when compared to the actual computation and are often bottlenecked by memory bandwidth. To address these inefficiencies, this dissertation proposes to repurpose existing memories for efficient in-memory pattern matching computation and presents new hardware-software co-design techniques to significantly improve the performance of these pattern matching workloads. First, a new hardware design is proposed that allows embarrassingly sequential finite state automata, a common computation model used for pattern matching, to be executed in parallel in a DRAM-based in-memory accelerator. Next, this dissertation presents the Cache Automaton architecture, which repurposes CPU last-level caches for massively parallel automata processing. This dissertation also takes a deep dive into accelerating genomics analysis, an emerging application domain which heavily relies on pattern matching. In particular, we design custom hardware and present new hardware-friendly data-structures and algorithms to accelerate read alignment, a time-consuming string matching workload in genomics analysis, which matches each of the billions of short fragments of DNA emitted by the sequencer (called reads) against a reference genome. Finally, this dissertation presents a detailed characterization of the pattern matching landscape in genomics and highlights opportunities for the development of new domain-specific architectures customized for genomics analysis. | |
dc.language.iso | en_US | |
dc.subject | Pattern Matching | |
dc.subject | Bioinformatics | |
dc.subject | Genomics | |
dc.subject | Hardware Acceleration | |
dc.subject | Automata | |
dc.subject | Benchmarking | |
dc.title | Accelerated Systems for Pattern Matching | |
dc.type | Thesis | |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Computer Science & Engineering | |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | |
dc.contributor.committeemember | Das, Reetuparna | |
dc.contributor.committeemember | Blaauw, David | |
dc.contributor.committeemember | Narayanasamy, Satish | |
dc.contributor.committeemember | Weimer, Westley R | |
dc.subject.hlbsecondlevel | Computer Science | |
dc.subject.hlbtoplevel | Engineering | |
dc.subject.hlbtoplevel | Health Sciences | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/171324/1/arunsub_1.pdf | |
dc.identifier.doi | https://dx.doi.org/10.7302/3836 | |
dc.identifier.orcid | 0000-0001-6119-3182 | |
dc.identifier.name-orcid | Subramaniyan, Arun; 0000-0001-6119-3182 | en_US |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.