Show simple item record

Advances in Sequential Decision Making Problems with Causal and Low-Rank Structures

dc.contributor.authorLu, Yangyi
dc.date.accessioned2022-09-06T16:16:40Z
dc.date.available2022-09-06T16:16:40Z
dc.date.issued2022
dc.date.submitted2022
dc.identifier.urihttps://hdl.handle.net/2027.42/174494
dc.description.abstractBandits and Markov Decision Processes are powerful sequential decision making paradigms that have been widely applied to solve real world problems. However, existing algorithms often suffer from high sample complexity due to the large action space. In this thesis, we present several contributions to reduce the sample complexity by exploiting the problem structure. In the first part, we study how to utilize the given causal information represented as a causal graph along with associated conditional distributions for bandit problems. We propose two algorithms, causal upper confidence bound (C-UCB) and causal Thompson Sampling (C-TS), that enjoy improved cumulative regret bounds compared with algorithms that do not use causal information. Further, we extend C-UCB and C-TS to the linear bandit setting. We also show that under certain causal structures, our algorithms scale better than the standard bandit algorithms as the number of interventions increases. In the second part, we further explore how to utilize the given causal information for Markov Decision Processes. We introduce causal Markov Decision Processes, a new formalism for sequential decision making which combines the standard Markov Decision Process formulation with causal structures over state transition and reward functions. We propose the causal upper confidence bound value iteration (C-UCBVI) algorithm that exploits the causal structure and improves the performance of standard reinforcement learning algorithms that do not take causal knowledge into account. To tackle the large state space problem in Markov Decision Process, we further formulate causal factored Markov Decision Process and design new algorithms with reduced regret. Lastly, we explore the connection between linear Markov Decision Process and causal Markov Decision Process. In the third part, we tackle the challenging setting where the causal information is unknown. We propose mild identifiability conditions and design new causal bandit algorithms for causal trees, causal forests and a general class of causal graphs. We prove that the regret guarantees of our algorithms greatly improve upon those of standard multi-armed bandit algorithms. Lastly, we prove our mild conditions are necessary: without them one cannot do better than standard bandit algorithms. In the fourth part, we investigate a challenging problem associated with the causal structure: unobserved confounders. We study to what extent the unobserved con- founders affect the estimation in the offline policy evaluation problem in reinforcement learning. We give the first minimax lower bound for error due to unobserved con- founder. We also analyze two algorithms and show they are minimax optimal. Lastly, we propose a new model-based method and show it is never worse than the model-free method proposed in prior work. In the last part, we explore another problem structure, the low-rank property of the ground truth parameter. We study linear bandits and generalized linear bandits, and we present algorithms via a novel combination of online-to-confidence-set conver- sion and the exponentially weighted average forecaster constructed by a covering of low-rank matrices. To get around the computational intractability of covering based approaches, we propose an efficient algorithm using the subspace exploration tech- nique. Our theoretical and empirical results demonstrate the effectiveness of utilizing the low-rank structures in reducing the regret.
dc.language.isoen_US
dc.subjectSequential Decision Making
dc.subjectBandits
dc.subjectReinforcement Learning
dc.subjectCausality
dc.subjectCausal Graph
dc.titleAdvances in Sequential Decision Making Problems with Causal and Low-Rank Structures
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineStatistics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberLevina, Liza
dc.contributor.committeememberScott, Clayton D
dc.contributor.committeememberHe, Xuming
dc.contributor.committeememberNguyen, Long
dc.subject.hlbsecondlevelStatistics and Numeric Data
dc.subject.hlbtoplevelScience
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/174494/1/yylu_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/6225
dc.identifier.orcid0000-0002-4216-5386
dc.identifier.name-orcidLu, Yangyi; 0000-0002-4216-5386en_US
dc.working.doi10.7302/6225en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.