Hardware Acceleration for Unstructured Big Data and Natural Language Processing.

Tandon, Prateek

Hardware Acceleration for Unstructured Big Data and Natural Language Processing.

dc.contributor.author	Tandon, Prateek	en_US
dc.date.accessioned	2016-01-13T18:04:56Z
dc.date.available	NO_RESTRICTION	en_US
dc.date.available	2016-01-13T18:04:56Z
dc.date.issued	2015	en_US
dc.date.submitted	2015	en_US
dc.identifier.uri	https://hdl.handle.net/2027.42/116712
dc.description.abstract	The confluence of the rapid growth in electronic data in recent years, and the renewed interest in domain-specific hardware accelerators presents exciting technical opportunities. Traditional scale-out solutions for processing the vast amounts of text data have been shown to be energy- and cost-inefficient. In contrast, custom hardware accelerators can provide higher throughputs, lower latencies, and significant energy savings. In this thesis, I present a set of hardware accelerators for unstructured big-data processing and natural language processing. The first accelerator, called HAWK, aims to speed up the processing of ad hoc queries against large in-memory logs. HAWK is motivated by the observation that traditional software-based tools for processing large text corpora use memory bandwidth inefficiently due to software overheads, and, thus, fall far short of peak scan rates possible on modern memory systems. HAWK is designed to process data at a constant rate of 32 GB/s—faster than most extant memory systems. I demonstrate that HAWK outperforms state-of-the-art software solutions for text processing, almost by an order of magnitude in many cases. HAWK occupies an area of 45 sq-mm in its pareto-optimal configuration and consumes 22 W of power, well within the area and power envelopes of modern CPU chips. The second accelerator I propose aims to speed up similarity measurement calculations for semantic search in the natural language processing space. By leveraging the latency hiding concepts of multi-threading and simple scheduling mechanisms, my design maximizes functional unit utilization. This similarity measurement accelerator provides speedups of 36x-42x over optimized software running on server-class cores, while requiring 56x-58x lower energy, and only 1.3% of the area.	en_US
dc.language.iso	en_US	en_US
dc.subject	hardware accelerators	en_US
dc.subject	unstrtuctured big data	en_US
dc.subject	natural language processing	en_US
dc.subject	unstructured log processing	en_US
dc.title	Hardware Acceleration for Unstructured Big Data and Natural Language Processing.	en_US
dc.type	Thesis	en_US
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Computer Science and Engineering	en_US
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies	en_US
dc.contributor.committeemember	Wenisch, Thomas F.	en_US
dc.contributor.committeemember	Sylvester, Dennis Michael	en_US
dc.contributor.committeemember	Cafarella, Michael John	en_US
dc.contributor.committeemember	Tang, Lingjia	en_US
dc.subject.hlbsecondlevel	Computer Science	en_US
dc.subject.hlbtoplevel	Engineering	en_US
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/116712/1/prateekt_1.pdf
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: prateekt_1.pdf
Size:: 3.784MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.