Show simple item record

Understanding and Identifying Challenges in Design of Safety-Critical AI Systems

dc.contributor.authorLubana, Ekdeep Singh
dc.date.accessioned2025-01-06T18:18:34Z
dc.date.available2025-01-06T18:18:34Z
dc.date.issued2024
dc.date.submitted2024
dc.identifier.urihttps://hdl.handle.net/2027.42/196092
dc.description.abstractIncreasingly powerful and general-purpose AI systems have found their way into our daily lives. As these systems spread to high-stakes applications, ensuring their safe deployment has become crucial: we must ensure these systems augment and benefit our society, instead of becoming an active source of harm. To this end, recent regulatory work has sought to define standards for identifying and mitigating vulnerabilities in AI driven applications. These works have grounded themselves in frameworks of risk regulation, motivated by the fact that establishing causality in AI-caused harms can be difficult. The goal of this dissertation is to challenge this design choice. Over three broad parts, the presented work argues that peculiarities of modern neural networks, such as their increasingly open-ended nature, lead to loopholes in off-the-shelf use of risk regulation as a framework for ensuring development of safe AI. Contributions of each part are summarized as follows. (i) Risk regulation assumes one can preemptively list harms expected from a system, allowing design of protocols to monitor them. In the first part, we present several empirical and formal models that demonstrate the unpredictable nature of neural network capabilities, showing that they can suddenly emerge and enable a network to perform tasks it was not intended for. Specifically, we show emergent learning either occurs when general structures underlying the data-generating process are learned by the model, hence accelerating learning of narrower tasks, or when a task is compositional in nature and capabilities relevant to performing the composition are learned by the model. Such unpredictability renders preemptive expectations of risks infeasible. (ii) Risk assessment requires well-defined evaluations. In the second part, we analyze two challenges to this goal. First, we show that when evaluating compositional capabilities, models exhibit an intriguing phenomenon wherein much before standard benchmarking helps claim the model possesses a capability, there exist latent interventions that can force the model to generate the desired output. Such capabilities can therefore evade risk assessment evaluations. Second, we analyze how minor input modifications can significantly alter a model’s behavior, hence complicating the establishment of safe use standards. We formalize this problem as input underspecification and analyze the mechanisms used by a model for inferring which solution, among the spectrum of valid ones, should be used to respond to an input. We show evidence suggesting large-scale models engage in a Bayesian selection protocol, i.e., minimal input changes that alter posterior probability of a solution can completely change these models’ output. (iii) Fine-tuning protocols are the current de-facto strategy for mitigating vulnerabilities identified in a neural network. In the final part, we identify limitations in these protocols, revealing they learn minimalistic “wrappers” over base capabilities of a model and hence do not adequately suppress undesirable behaviors outside of the fine-tuning data distribution. Overall, the contributions of this dissertation suggest regulation of AI systems requires exploration of novel and more nuanced paradigms that go beyond mere risk regulation. This can involve intermixing several viable frameworks, e.g., liability based models and tort law, to define backstops when risk regulation fails to adequately foresee harms possible from a system.
dc.language.isoen_US
dc.subjectEmergent abilities in neural networks
dc.subjectRisk regulation
dc.titleUnderstanding and Identifying Challenges in Design of Safety-Critical AI Systems
dc.typeThesis
dc.description.thesisdegreenamePhD
dc.description.thesisdegreedisciplineElectrical and Computer Engineering
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberDick, Robert
dc.contributor.committeememberCard, Dallas
dc.contributor.committeememberKrueger, David
dc.contributor.committeememberOwens, Andrew
dc.contributor.committeememberTanaka, Hidenori
dc.subject.hlbsecondlevelElectrical Engineering
dc.subject.hlbtoplevelEngineering
dc.subject.hlbtoplevelScience
dc.contributor.affiliationumcampusAnn Arbor
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/196092/1/eslubana_1.pdf
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/196092/2/eslubana_2.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/25028
dc.identifier.orcid0009-0004-9849-7859
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.