Advances in Sequential Decision Making Problems with Causal and Low-Rank Structures

Lu, Yangyi

Advances in Sequential Decision Making Problems with Causal and Low-Rank Structures

dc.contributor.author	Lu, Yangyi
dc.date.accessioned	2022-09-06T16:16:40Z
dc.date.available	2022-09-06T16:16:40Z
dc.date.issued	2022
dc.date.submitted	2022
dc.identifier.uri	https://hdl.handle.net/2027.42/174494
dc.description.abstract	Bandits and Markov Decision Processes are powerful sequential decision making paradigms that have been widely applied to solve real world problems. However, existing algorithms often suffer from high sample complexity due to the large action space. In this thesis, we present several contributions to reduce the sample complexity by exploiting the problem structure. In the first part, we study how to utilize the given causal information represented as a causal graph along with associated conditional distributions for bandit problems. We propose two algorithms, causal upper confidence bound (C-UCB) and causal Thompson Sampling (C-TS), that enjoy improved cumulative regret bounds compared with algorithms that do not use causal information. Further, we extend C-UCB and C-TS to the linear bandit setting. We also show that under certain causal structures, our algorithms scale better than the standard bandit algorithms as the number of interventions increases. In the second part, we further explore how to utilize the given causal information for Markov Decision Processes. We introduce causal Markov Decision Processes, a new formalism for sequential decision making which combines the standard Markov Decision Process formulation with causal structures over state transition and reward functions. We propose the causal upper confidence bound value iteration (C-UCBVI) algorithm that exploits the causal structure and improves the performance of standard reinforcement learning algorithms that do not take causal knowledge into account. To tackle the large state space problem in Markov Decision Process, we further formulate causal factored Markov Decision Process and design new algorithms with reduced regret. Lastly, we explore the connection between linear Markov Decision Process and causal Markov Decision Process. In the third part, we tackle the challenging setting where the causal information is unknown. We propose mild identifiability conditions and design new causal bandit algorithms for causal trees, causal forests and a general class of causal graphs. We prove that the regret guarantees of our algorithms greatly improve upon those of standard multi-armed bandit algorithms. Lastly, we prove our mild conditions are necessary: without them one cannot do better than standard bandit algorithms. In the fourth part, we investigate a challenging problem associated with the causal structure: unobserved confounders. We study to what extent the unobserved con- founders affect the estimation in the offline policy evaluation problem in reinforcement learning. We give the first minimax lower bound for error due to unobserved con- founder. We also analyze two algorithms and show they are minimax optimal. Lastly, we propose a new model-based method and show it is never worse than the model-free method proposed in prior work. In the last part, we explore another problem structure, the low-rank property of the ground truth parameter. We study linear bandits and generalized linear bandits, and we present algorithms via a novel combination of online-to-confidence-set conver- sion and the exponentially weighted average forecaster constructed by a covering of low-rank matrices. To get around the computational intractability of covering based approaches, we propose an efficient algorithm using the subspace exploration tech- nique. Our theoretical and empirical results demonstrate the effectiveness of utilizing the low-rank structures in reducing the regret.
dc.language.iso	en_US
dc.subject	Sequential Decision Making
dc.subject	Bandits
dc.subject	Reinforcement Learning
dc.subject	Causality
dc.subject	Causal Graph
dc.title	Advances in Sequential Decision Making Problems with Causal and Low-Rank Structures
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Statistics
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Levina, Liza
dc.contributor.committeemember	Scott, Clayton D
dc.contributor.committeemember	He, Xuming
dc.contributor.committeemember	Nguyen, Long
dc.subject.hlbsecondlevel	Statistics and Numeric Data
dc.subject.hlbtoplevel	Science
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/174494/1/yylu_1.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/6225
dc.identifier.orcid	0000-0002-4216-5386
dc.identifier.name-orcid	Lu, Yangyi; 0000-0002-4216-5386	en_US
dc.working.doi	10.7302/6225	en
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: yylu_1.pdf
Size:: 1.602MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.