Advances in Sequential Decision Making Problems with Causal and Low-Rank Structures
dc.contributor.author | Lu, Yangyi | |
dc.date.accessioned | 2022-09-06T16:16:40Z | |
dc.date.available | 2022-09-06T16:16:40Z | |
dc.date.issued | 2022 | |
dc.date.submitted | 2022 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/174494 | |
dc.description.abstract | Bandits and Markov Decision Processes are powerful sequential decision making paradigms that have been widely applied to solve real world problems. However, existing algorithms often suffer from high sample complexity due to the large action space. In this thesis, we present several contributions to reduce the sample complexity by exploiting the problem structure. In the first part, we study how to utilize the given causal information represented as a causal graph along with associated conditional distributions for bandit problems. We propose two algorithms, causal upper confidence bound (C-UCB) and causal Thompson Sampling (C-TS), that enjoy improved cumulative regret bounds compared with algorithms that do not use causal information. Further, we extend C-UCB and C-TS to the linear bandit setting. We also show that under certain causal structures, our algorithms scale better than the standard bandit algorithms as the number of interventions increases. In the second part, we further explore how to utilize the given causal information for Markov Decision Processes. We introduce causal Markov Decision Processes, a new formalism for sequential decision making which combines the standard Markov Decision Process formulation with causal structures over state transition and reward functions. We propose the causal upper confidence bound value iteration (C-UCBVI) algorithm that exploits the causal structure and improves the performance of standard reinforcement learning algorithms that do not take causal knowledge into account. To tackle the large state space problem in Markov Decision Process, we further formulate causal factored Markov Decision Process and design new algorithms with reduced regret. Lastly, we explore the connection between linear Markov Decision Process and causal Markov Decision Process. In the third part, we tackle the challenging setting where the causal information is unknown. We propose mild identifiability conditions and design new causal bandit algorithms for causal trees, causal forests and a general class of causal graphs. We prove that the regret guarantees of our algorithms greatly improve upon those of standard multi-armed bandit algorithms. Lastly, we prove our mild conditions are necessary: without them one cannot do better than standard bandit algorithms. In the fourth part, we investigate a challenging problem associated with the causal structure: unobserved confounders. We study to what extent the unobserved con- founders affect the estimation in the offline policy evaluation problem in reinforcement learning. We give the first minimax lower bound for error due to unobserved con- founder. We also analyze two algorithms and show they are minimax optimal. Lastly, we propose a new model-based method and show it is never worse than the model-free method proposed in prior work. In the last part, we explore another problem structure, the low-rank property of the ground truth parameter. We study linear bandits and generalized linear bandits, and we present algorithms via a novel combination of online-to-confidence-set conver- sion and the exponentially weighted average forecaster constructed by a covering of low-rank matrices. To get around the computational intractability of covering based approaches, we propose an efficient algorithm using the subspace exploration tech- nique. Our theoretical and empirical results demonstrate the effectiveness of utilizing the low-rank structures in reducing the regret. | |
dc.language.iso | en_US | |
dc.subject | Sequential Decision Making | |
dc.subject | Bandits | |
dc.subject | Reinforcement Learning | |
dc.subject | Causality | |
dc.subject | Causal Graph | |
dc.title | Advances in Sequential Decision Making Problems with Causal and Low-Rank Structures | |
dc.type | Thesis | |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Statistics | |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | |
dc.contributor.committeemember | Levina, Liza | |
dc.contributor.committeemember | Scott, Clayton D | |
dc.contributor.committeemember | He, Xuming | |
dc.contributor.committeemember | Nguyen, Long | |
dc.subject.hlbsecondlevel | Statistics and Numeric Data | |
dc.subject.hlbtoplevel | Science | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/174494/1/yylu_1.pdf | |
dc.identifier.doi | https://dx.doi.org/10.7302/6225 | |
dc.identifier.orcid | 0000-0002-4216-5386 | |
dc.identifier.name-orcid | Lu, Yangyi; 0000-0002-4216-5386 | en_US |
dc.working.doi | 10.7302/6225 | en |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.