Understanding and Identifying Challenges in Design of Safety-Critical AI Systems

Lubana, Ekdeep Singh

Understanding and Identifying Challenges in Design of Safety-Critical AI Systems

dc.contributor.author	Lubana, Ekdeep Singh
dc.date.accessioned	2025-01-06T18:18:34Z
dc.date.available	2025-01-06T18:18:34Z
dc.date.issued	2024
dc.date.submitted	2024
dc.identifier.uri	https://hdl.handle.net/2027.42/196092
dc.description.abstract	Increasingly powerful and general-purpose AI systems have found their way into our daily lives. As these systems spread to high-stakes applications, ensuring their safe deployment has become crucial: we must ensure these systems augment and benefit our society, instead of becoming an active source of harm. To this end, recent regulatory work has sought to define standards for identifying and mitigating vulnerabilities in AI driven applications. These works have grounded themselves in frameworks of risk regulation, motivated by the fact that establishing causality in AI-caused harms can be difficult. The goal of this dissertation is to challenge this design choice. Over three broad parts, the presented work argues that peculiarities of modern neural networks, such as their increasingly open-ended nature, lead to loopholes in off-the-shelf use of risk regulation as a framework for ensuring development of safe AI. Contributions of each part are summarized as follows. (i) Risk regulation assumes one can preemptively list harms expected from a system, allowing design of protocols to monitor them. In the first part, we present several empirical and formal models that demonstrate the unpredictable nature of neural network capabilities, showing that they can suddenly emerge and enable a network to perform tasks it was not intended for. Specifically, we show emergent learning either occurs when general structures underlying the data-generating process are learned by the model, hence accelerating learning of narrower tasks, or when a task is compositional in nature and capabilities relevant to performing the composition are learned by the model. Such unpredictability renders preemptive expectations of risks infeasible. (ii) Risk assessment requires well-defined evaluations. In the second part, we analyze two challenges to this goal. First, we show that when evaluating compositional capabilities, models exhibit an intriguing phenomenon wherein much before standard benchmarking helps claim the model possesses a capability, there exist latent interventions that can force the model to generate the desired output. Such capabilities can therefore evade risk assessment evaluations. Second, we analyze how minor input modifications can significantly alter a model’s behavior, hence complicating the establishment of safe use standards. We formalize this problem as input underspecification and analyze the mechanisms used by a model for inferring which solution, among the spectrum of valid ones, should be used to respond to an input. We show evidence suggesting large-scale models engage in a Bayesian selection protocol, i.e., minimal input changes that alter posterior probability of a solution can completely change these models’ output. (iii) Fine-tuning protocols are the current de-facto strategy for mitigating vulnerabilities identified in a neural network. In the final part, we identify limitations in these protocols, revealing they learn minimalistic “wrappers” over base capabilities of a model and hence do not adequately suppress undesirable behaviors outside of the fine-tuning data distribution. Overall, the contributions of this dissertation suggest regulation of AI systems requires exploration of novel and more nuanced paradigms that go beyond mere risk regulation. This can involve intermixing several viable frameworks, e.g., liability based models and tort law, to define backstops when risk regulation fails to adequately foresee harms possible from a system.
dc.language.iso	en_US
dc.subject	Emergent abilities in neural networks
dc.subject	Risk regulation
dc.title	Understanding and Identifying Challenges in Design of Safety-Critical AI Systems
dc.type	Thesis
dc.description.thesisdegreename	PhD
dc.description.thesisdegreediscipline	Electrical and Computer Engineering
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Dick, Robert
dc.contributor.committeemember	Card, Dallas
dc.contributor.committeemember	Krueger, David
dc.contributor.committeemember	Owens, Andrew
dc.contributor.committeemember	Tanaka, Hidenori
dc.subject.hlbsecondlevel	Electrical Engineering
dc.subject.hlbtoplevel	Engineering
dc.subject.hlbtoplevel	Science
dc.contributor.affiliationumcampus	Ann Arbor
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/196092/1/eslubana_1.pdf
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/196092/2/eslubana_2.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/25028
dc.identifier.orcid	0009-0004-9849-7859
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: eslubana_1.pdf
Size:: 41.15MB
Format:: PDF

View/Open

Name:: eslubana_2.pdf
Size:: 65.46MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.