Provably Efficient Reinforcement Learning Under Linear Model Structures: From Tabular to Feature Based Exploration

Modi, Aditya

Provably Efficient Reinforcement Learning Under Linear Model Structures: From Tabular to Feature Based Exploration

dc.contributor.author	Modi, Aditya
dc.date.accessioned	2022-01-19T15:23:53Z
dc.date.available	2022-01-19T15:23:53Z
dc.date.issued	2021
dc.identifier.uri	https://hdl.handle.net/2027.42/171361
dc.description.abstract	Reinforcement learning (RL) is a machine learning paradigm where an agent learns to interact with an environment in the presence of a reward signal as feedback. Recent breakthroughs have led to a renewed interest in building intelligent RL agents by combining the core RL algorithms with the expressivity of deep function approximators and advances in computation via simulation. Despite the recent advances, in most complex domains RL algorithms need a large amount of interaction data in order to learn a good policy. As a result, recent theoretical research has focused on problems pertaining to the quantification and/or improvement of sample efficiency of RL under various interaction protocols. These efforts are directed towards understanding the statistical aspects of reinforcement learning, which can be a key factor in making progress towards real world applications, ranging from healthcare and robotics to control of large scale web systems. The main theme of this thesis is the analysis of such information-theoretic aspects of RL in terms of the structural complexity of the environment by using tools from the learning theory literature. In this thesis, we consider a spectrum of scenarios: ranging from tabular to rich observation domains, single to multi-task settings and reward-specific to reward-free learning. Specifically, this thesis presents theoretical results in the following settings: the 1st part studies the sample efficiency of online learning in contextual Markov decision processes (MDPs) where the agent interacts with a sequence of tabular environments and has access to a high-dimensional representation that determines the dynamics of each MDP. The 2nd part studies the sample complexity of learning in a structured model class where the true model of an arbitrarily large MDP can be approximated using a feature-based linear combination of a known ensemble of models. The 3rd part investigates the problem of learning in a low-rank MDP where I design and analyze the first provably efficient model-free representation learning and exploration algorithm for reward-free learning. Lastly, in the 4th part, I provide results for online multi-task learning in linear quadratic regulators, under the assumption that the transition matrices for each system share a common basis. A common thread running through all results in this thesis is the effect of a linear/low-rank structure in the model on the sample efficiency of learning. Overall, this thesis proposes novel provably efficient exploration and model selection algorithms under various linear model structures, and highlights the role of environment complexity in the statistical efficiency of reinforcement learning.
dc.language.iso	en_US
dc.subject	Reinforcement learning and control
dc.subject	Statistical and online learning theory
dc.subject	Linear systems
dc.subject	Efficient exploration
dc.subject	Sequential decision making
dc.title	Provably Efficient Reinforcement Learning Under Linear Model Structures: From Tabular to Feature Based Exploration
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Computer Science & Engineering
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Baveja, Satinder Singh
dc.contributor.committeemember	Tewari, Ambuj
dc.contributor.committeemember	Scott, Clayton D
dc.contributor.committeemember	Cheraghchi, Mahdi
dc.contributor.committeemember	Jiang, Nan
dc.subject.hlbsecondlevel	Computer Science
dc.subject.hlbsecondlevel	Mathematics
dc.subject.hlbtoplevel	Engineering
dc.subject.hlbtoplevel	Science
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/171361/1/admodi_1.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/3873
dc.identifier.orcid	0000-0002-6959-0593
dc.identifier.name-orcid	Modi, Aditya; 0000-0002-6959-0593	en_US
dc.working.doi	10.7302/3873	en
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: admodi_1.pdf
Size:: 1.830MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.