Hazy Oracles in Deep Learning

Lemmer, Stephan

Hazy Oracles in Deep Learning

Lemmer, Stephan

2023

View/Open

lemmersj_1.pdf

(62.5MB

PDF)

Abstract

While deep learning problems are often motivated as enabling technologies for human-computer interaction---a robot, for example, must align natural language referents and sensor readings to operate in a human world---assumptions of these works make them poorly suited to real-world human interaction. Specifically, evaluation typically assumes that humans are oracles that provide semantically correct and unambiguous information, and that all such information is equally useful. While this is enforced in controlled experiments via carefully curated datasets, models operating in the wild will need to compensate for the fact that humans are hazy oracles that may provide information that is incorrect, ambiguous, or misaligned with the features learned by the model. A natural question follows: how can we use models trained via the oracle assumption with hazy humans? We answer this question via a method we call deferred inference, which allows models trained via supervised learning to solicit and integrate additional information from the human when necessary. Deferred inference begins with a method for determining if the model should defer inference and wait for additional human-provided information. Past work has generally simplified this into one of two questions: is the human-provided information correct? or is the output correct? However, we find that these approaches are insufficient due to the complex relationship between human inputs, sensor readings, and deep models: low-quality human-provided information may not cause error, while high-quality human-provided information may not correct it. To address this misalignment we introduce Dual-loss Additional Error Regression, or DAER, a method that successfully locates instances where a new human input can reduce error. After introducing DAER, we note that we must not only consider how to find error caused by human input, but also how to integrate potentially noisy deferral responses and measure overall performance. For this, we introduce aggregation functions that integrate information across multiple inferences and a novel evaluation framework that measures the trade-off between error and additional human effort. Through this evaluation, we show that we can reduce error by up to 48% under a reasonable level of human effort without any changes to training or architecture. Last, we consider how to shift from datasets to individuals. While crowdsourced datasets allow rapid implementation and evaluation of deferral and aggregation functions, they do not accurately model human-computer interaction: the mechanisms used to crowdsource data impose shifts in the distribution, and the failure to identify individual annotators makes the tacit assumptions that all humans are the same and inputs do not change over time or deferral depth. Through a human-centered experiment, we show that these assumptions are not true: an ideal deferral function must be calibrated for a specific user, users learn the model over time, and the deferral response is likely to be of lower quality than the initial query. While deep-learned models have been proposed for many applications that require cooperation between humans and computers, deploying models that were trained and evaluated across carefully curated datasets remains a challenge due to the hazy nature of human inputs. In this dissertation, we propose deferred inference as a method for addressing this challenge while respecting the paradigm of supervised training. By demonstrating deferred inference on four disparate problems, we provide insights into its challenges, benefits, and generalizability that motivate and lay the foundation for the eventual deployment of deep-learned human-in-the-loop models.

Deep Blue DOI

https://dx.doi.org/10.7302/7340

Subjects

Human-Computer Interaction

Computer Vision

Deep Learning

Deferred Inference

Types

Thesis

Handle

https://hdl.handle.net/2027.42/176491

Metadata

Show full item record

Collections

Dissertations and Theses (Ph.D. and Master's)

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.