Application-Aware Scheduling in Deep Learning Software Stacks

Yu, Peifeng

Application-Aware Scheduling in Deep Learning Software Stacks

dc.contributor.author	Yu, Peifeng
dc.date.accessioned	2022-09-06T15:59:03Z
dc.date.available	2022-09-06T15:59:03Z
dc.date.issued	2022
dc.date.submitted	2022
dc.identifier.uri	https://hdl.handle.net/2027.42/174199
dc.description.abstract	DL has pervaded many areas of computing due to the confluence of the explosive growth of large-scale computing capabilities, availability of datasets, and advances in learning techniques. However, the infrastructure that supports DL is still in its early stage, bearing mismatches among the hardware, the software stack, and DL applications. On the one hand, despite the emergence of new unique hardware and new use cases, the software stack that abstracts and schedules these hardware resources remains largely unchanged. On the other hand, user-defined performance metrics common in DL applications urge better schedulers tailored to the application's specific needs. Motivated by the mismatch, this dissertation revisits the system design across the stack, with a focus on the synergy between schedulers and application/system-specific information. At the bottom level, the ever-growing adoption of specialized hardware like GPUs poses challenges to efficient usage. Due to the lack of operating system arbitration, applications usually assume exclusive access, making the otherwise underutilized resources unusable for other jobs on the same host. We therefore design Salus to realize proper efficient GPU sharing. It leverages DL applications' specific usage patterns to schedule iterations and manage memory allocations, providing two missing primitives: fast job switching and memory sharing. However, even with an efficient execution platform, it is still not trivial to harvest the hardware's full potential for higher-level applications. We investigate two such cases sitting on opposite sides of a model's lifecycle: hyperparameter tuning and inference serving. Hyperparameter tuning -- which constitutes a great portion of DL cluster usage given the proliferation of distributed resources in clusters -- generates many small interdependent training trials. Existing tuning algorithms are oblivious of advanced execution strategies like intra-GPU sharing and inter-GPU execution, often causing poor resource utilization. Hence, we propose Fluid as a generalized hyperparameter tuning execution engine, that coordinates between tuning jobs and cluster resources. Fluid schedules training trials in such jobs using a water-filling approach to make the best use of resources at both intra- and inter-GPU granularity to speed up hyperparameter tuning. Moving on, inference serving also requires careful scheduling to achieve tight latency guarantees and maintain high utilization. Existing serving solutions assume inference execution times to be data-independent and thus highly predictable. However, with the rise of dynamic neural networks, data-dependent inferences see higher variance in execution times and become less predictable by a single, point estimation of the true running times. With Orloj, we show that treating and modeling inference execution times as probability distributions bring large gains for scheduling inference requests in the presence of SLO constraints. In this dissertation, we consider combining application/system-specific information with scheduling design as a means of efficiently supporting new hardware and new DL application use cases. Nevertheless, the pursuit of higher efficiency never ends. This dissertation tries to lay down the necessary mechanisms with the hope that our crude work may be a basis for further research to better scheduling algorithms and more efficient systems in the DL infrastructure.
dc.language.iso	en_US
dc.subject	Deep Learning
dc.subject	Scheduling
dc.subject	GPU
dc.subject	Inference
dc.subject	Training
dc.subject	Hyperparameter Tuning
dc.title	Application-Aware Scheduling in Deep Learning Software Stacks
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Computer Science & Engineering
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Chowdhury, Mosharaf
dc.contributor.committeemember	Duraisamy, Karthik
dc.contributor.committeemember	Jin, Xin
dc.contributor.committeemember	Kasikci, Baris
dc.subject.hlbsecondlevel	Computer Science
dc.subject.hlbtoplevel	Engineering
dc.subject.hlbtoplevel	Science
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/174199/1/peifeng_1.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/5930
dc.identifier.orcid	0000-0001-7001-6647
dc.identifier.name-orcid	YU, Peifeng; 0000-0001-7001-6647	en_US
dc.working.doi	10.7302/5930	en
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: peifeng_1.pdf
Size:: 5.413MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.