Accelerated Systems for Pattern Matching

Subramaniyan, Arun

Accelerated Systems for Pattern Matching

dc.contributor.author	Subramaniyan, Arun
dc.date.accessioned	2022-01-19T15:22:15Z
dc.date.available	2022-01-19T15:22:15Z
dc.date.issued	2021
dc.identifier.uri	https://hdl.handle.net/2027.42/171324
dc.description.abstract	Pattern matching forms the core of many applications and contributes to a significant fraction of their execution time. For instance, scanning the human reference genome for motifs and identifying variations between individuals requires processing of 100s of gigabytes of unstructured data and can take several days on a multi-core processor. Parsing activities in the frontend of browsers can account for up to 40% of web page loading time. Datacenter log processing involves the analysis of data generated at the rate of several terabytes every few minutes. While general-purpose processors have been optimized for regular, data-parallel workloads, the class of pattern matching workloads identified above typically employ computational models and data-structures that are not well suited for general-purpose processing. In particular, these workloads perform irregular memory accesses and spend disproportionately more time and energy in moving data from storage to compute units when compared to the actual computation and are often bottlenecked by memory bandwidth. To address these inefficiencies, this dissertation proposes to repurpose existing memories for efficient in-memory pattern matching computation and presents new hardware-software co-design techniques to significantly improve the performance of these pattern matching workloads. First, a new hardware design is proposed that allows embarrassingly sequential finite state automata, a common computation model used for pattern matching, to be executed in parallel in a DRAM-based in-memory accelerator. Next, this dissertation presents the Cache Automaton architecture, which repurposes CPU last-level caches for massively parallel automata processing. This dissertation also takes a deep dive into accelerating genomics analysis, an emerging application domain which heavily relies on pattern matching. In particular, we design custom hardware and present new hardware-friendly data-structures and algorithms to accelerate read alignment, a time-consuming string matching workload in genomics analysis, which matches each of the billions of short fragments of DNA emitted by the sequencer (called reads) against a reference genome. Finally, this dissertation presents a detailed characterization of the pattern matching landscape in genomics and highlights opportunities for the development of new domain-specific architectures customized for genomics analysis.
dc.language.iso	en_US
dc.subject	Pattern Matching
dc.subject	Bioinformatics
dc.subject	Genomics
dc.subject	Hardware Acceleration
dc.subject	Automata
dc.subject	Benchmarking
dc.title	Accelerated Systems for Pattern Matching
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Computer Science & Engineering
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Das, Reetuparna
dc.contributor.committeemember	Blaauw, David
dc.contributor.committeemember	Narayanasamy, Satish
dc.contributor.committeemember	Weimer, Westley R
dc.subject.hlbsecondlevel	Computer Science
dc.subject.hlbtoplevel	Engineering
dc.subject.hlbtoplevel	Health Sciences
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/171324/1/arunsub_1.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/3836
dc.identifier.orcid	0000-0001-6119-3182
dc.identifier.name-orcid	Subramaniyan, Arun; 0000-0001-6119-3182	en_US
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: arunsub_1.pdf
Size:: 8.155MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.