Show simple item record

Advances in Deep Reinforcement Learning: Intrinsic Rewards, Temporal Credit Assignment, State Representations, and Value-equivalent Models

dc.contributor.authorZheng, Zeyu
dc.date.accessioned2022-09-06T16:25:03Z
dc.date.available2022-09-06T16:25:03Z
dc.date.issued2022
dc.date.submitted2022
dc.identifier.urihttps://hdl.handle.net/2027.42/174601
dc.description.abstractReinforcement learning (RL) is a machine learning paradigm concerned with how an agent learns to predict and control its own experience stream so as to maximize long-term cumulative reward. In the past decade, deep reinforcement learning (DeepRL), a subfield that aims to combine the sequential decision-making techniques in RL with the powerful non-linear function approximation tools offered by deep learning, has seen great success such as defeating human champions in the ancient board game Go and achieving expert-level performance in complex strategy games like Dota $2$ and Starcraft. It has also had an impact on real-world applications. Examples include robot control, stratospheric balloon navigation, and controlling nuclear fusion plasma. This thesis aims to further advance DeepRL techniques. Concretely, this thesis makes contributions in the following four directions: 1) In reward design, we develop a novel meta-learning algorithm for learning reward functions that facilitate policy optimization. Our algorithm improves the performance of policy-gradient methods and outperforms handcrafted heuristic reward functions. In a follow-up study, we show that the learned reward functions can capture knowledge about long-term exploration and exploitation and can generalize to different RL algorithms and changes in the environment dynamics. 2) In temporal credit assignment, we explore methods based on pairwise weights that are functions of the state in which the action was taken, the state in which the reward was received, and the time elapsed in between. We develop a metagradient algorithm for adapting these weights during policy learning. Our experiments show that our method achieves better performance than competing approaches. 3) In state representation learning, we investigate using random deep action-conditional prediction tasks as auxiliary tasks to help agents learn better state representations. Our experiments show that random deep action-conditional predictions can often yield better performance than handcrafted auxiliary tasks. 4) In model learning and planning, we develop a new method for learning value-equivalent models, a class of models that demonstrates strong empirical performance lately, that generalizes existing methods. Our experiments show that our method can improve both the model prediction accuracy and the control performance of the downstream planning procedure.
dc.language.isoen_US
dc.subjectartificial intelligence
dc.subjectmachine learning
dc.subjectreinforcement learning
dc.subjectdeep learning
dc.titleAdvances in Deep Reinforcement Learning: Intrinsic Rewards, Temporal Credit Assignment, State Representations, and Value-equivalent Models
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineComputer Science & Engineering
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberBaveja, Satinder Singh
dc.contributor.committeememberLewis, Richard L
dc.contributor.committeememberLee, Honglak
dc.contributor.committeememberSilver, David
dc.subject.hlbsecondlevelComputer Science
dc.subject.hlbtoplevelEngineering
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/174601/1/zeyu_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/6332
dc.identifier.orcid0000-0002-1101-5991
dc.identifier.name-orcidZheng, Zeyu; 0000-0002-1101-5991en_US
dc.working.doi10.7302/6332en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.