Provably Efficient Reinforcement Learning Under Linear Model Structures: From Tabular to Feature Based Exploration
dc.contributor.author | Modi, Aditya | |
dc.date.accessioned | 2022-01-19T15:23:53Z | |
dc.date.available | 2022-01-19T15:23:53Z | |
dc.date.issued | 2021 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/171361 | |
dc.description.abstract | Reinforcement learning (RL) is a machine learning paradigm where an agent learns to interact with an environment in the presence of a reward signal as feedback. Recent breakthroughs have led to a renewed interest in building intelligent RL agents by combining the core RL algorithms with the expressivity of deep function approximators and advances in computation via simulation. Despite the recent advances, in most complex domains RL algorithms need a large amount of interaction data in order to learn a good policy. As a result, recent theoretical research has focused on problems pertaining to the quantification and/or improvement of sample efficiency of RL under various interaction protocols. These efforts are directed towards understanding the statistical aspects of reinforcement learning, which can be a key factor in making progress towards real world applications, ranging from healthcare and robotics to control of large scale web systems. The main theme of this thesis is the analysis of such information-theoretic aspects of RL in terms of the structural complexity of the environment by using tools from the learning theory literature. In this thesis, we consider a spectrum of scenarios: ranging from tabular to rich observation domains, single to multi-task settings and reward-specific to reward-free learning. Specifically, this thesis presents theoretical results in the following settings: the 1st part studies the sample efficiency of online learning in contextual Markov decision processes (MDPs) where the agent interacts with a sequence of tabular environments and has access to a high-dimensional representation that determines the dynamics of each MDP. The 2nd part studies the sample complexity of learning in a structured model class where the true model of an arbitrarily large MDP can be approximated using a feature-based linear combination of a known ensemble of models. The 3rd part investigates the problem of learning in a low-rank MDP where I design and analyze the first provably efficient model-free representation learning and exploration algorithm for reward-free learning. Lastly, in the 4th part, I provide results for online multi-task learning in linear quadratic regulators, under the assumption that the transition matrices for each system share a common basis. A common thread running through all results in this thesis is the effect of a linear/low-rank structure in the model on the sample efficiency of learning. Overall, this thesis proposes novel provably efficient exploration and model selection algorithms under various linear model structures, and highlights the role of environment complexity in the statistical efficiency of reinforcement learning. | |
dc.language.iso | en_US | |
dc.subject | Reinforcement learning and control | |
dc.subject | Statistical and online learning theory | |
dc.subject | Linear systems | |
dc.subject | Efficient exploration | |
dc.subject | Sequential decision making | |
dc.title | Provably Efficient Reinforcement Learning Under Linear Model Structures: From Tabular to Feature Based Exploration | |
dc.type | Thesis | |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Computer Science & Engineering | |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | |
dc.contributor.committeemember | Baveja, Satinder Singh | |
dc.contributor.committeemember | Tewari, Ambuj | |
dc.contributor.committeemember | Scott, Clayton D | |
dc.contributor.committeemember | Cheraghchi, Mahdi | |
dc.contributor.committeemember | Jiang, Nan | |
dc.subject.hlbsecondlevel | Computer Science | |
dc.subject.hlbsecondlevel | Mathematics | |
dc.subject.hlbtoplevel | Engineering | |
dc.subject.hlbtoplevel | Science | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/171361/1/admodi_1.pdf | |
dc.identifier.doi | https://dx.doi.org/10.7302/3873 | |
dc.identifier.orcid | 0000-0002-6959-0593 | |
dc.identifier.name-orcid | Modi, Aditya; 0000-0002-6959-0593 | en_US |
dc.working.doi | 10.7302/3873 | en |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.