Show simple item record

Accelerated Systems for Pattern Matching

dc.contributor.authorSubramaniyan, Arun
dc.date.accessioned2022-01-19T15:22:15Z
dc.date.available2022-01-19T15:22:15Z
dc.date.issued2021
dc.identifier.urihttps://hdl.handle.net/2027.42/171324
dc.description.abstractPattern matching forms the core of many applications and contributes to a significant fraction of their execution time. For instance, scanning the human reference genome for motifs and identifying variations between individuals requires processing of 100s of gigabytes of unstructured data and can take several days on a multi-core processor. Parsing activities in the frontend of browsers can account for up to 40% of web page loading time. Datacenter log processing involves the analysis of data generated at the rate of several terabytes every few minutes. While general-purpose processors have been optimized for regular, data-parallel workloads, the class of pattern matching workloads identified above typically employ computational models and data-structures that are not well suited for general-purpose processing. In particular, these workloads perform irregular memory accesses and spend disproportionately more time and energy in moving data from storage to compute units when compared to the actual computation and are often bottlenecked by memory bandwidth. To address these inefficiencies, this dissertation proposes to repurpose existing memories for efficient in-memory pattern matching computation and presents new hardware-software co-design techniques to significantly improve the performance of these pattern matching workloads. First, a new hardware design is proposed that allows embarrassingly sequential finite state automata, a common computation model used for pattern matching, to be executed in parallel in a DRAM-based in-memory accelerator. Next, this dissertation presents the Cache Automaton architecture, which repurposes CPU last-level caches for massively parallel automata processing. This dissertation also takes a deep dive into accelerating genomics analysis, an emerging application domain which heavily relies on pattern matching. In particular, we design custom hardware and present new hardware-friendly data-structures and algorithms to accelerate read alignment, a time-consuming string matching workload in genomics analysis, which matches each of the billions of short fragments of DNA emitted by the sequencer (called reads) against a reference genome. Finally, this dissertation presents a detailed characterization of the pattern matching landscape in genomics and highlights opportunities for the development of new domain-specific architectures customized for genomics analysis.
dc.language.isoen_US
dc.subjectPattern Matching
dc.subjectBioinformatics
dc.subjectGenomics
dc.subjectHardware Acceleration
dc.subjectAutomata
dc.subjectBenchmarking
dc.titleAccelerated Systems for Pattern Matching
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineComputer Science & Engineering
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberDas, Reetuparna
dc.contributor.committeememberBlaauw, David
dc.contributor.committeememberNarayanasamy, Satish
dc.contributor.committeememberWeimer, Westley R
dc.subject.hlbsecondlevelComputer Science
dc.subject.hlbtoplevelEngineering
dc.subject.hlbtoplevelHealth Sciences
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/171324/1/arunsub_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/3836
dc.identifier.orcid0000-0001-6119-3182
dc.identifier.name-orcidSubramaniyan, Arun; 0000-0001-6119-3182en_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.