REGAL Revisited: Regularized Reinforcement Learning for Weakly Communicating MDPs

Tran, A. T.

REGAL Revisited: Regularized Reinforcement Learning for Weakly Communicating MDPs

dc.contributor.author	Tran, A. T.
dc.contributor.advisor	Tewari, Ambuj
dc.date.accessioned	2023-05-25T16:01:07Z
dc.date.available	2023-05-25T16:01:07Z
dc.date.issued	2021
dc.identifier.uri	https://hdl.handle.net/2027.42/176692
dc.description.abstract	Markov Decision Processes (MDPs) are used extensively in artificial intelligence and reinforcement learning to describe how an agent interacts with its environment. MDPs specify what possible states an agent can be in, what its possible actions are, and what the effects of taking those actions will be. By studying MDPs, we can develop techniques and algorithms which can be applied to many real-world problems where an agent is required to act intelligently. Many such algorithms have been developed with the assumption that the conditions of the MDP is known to the agent, i.e. the agent knows all the information it needs about the environment, including the possible states, actions, and effects of taking actions. This project focuses on settings where such information is unknown to the agent: it does not know how the world will behave when it takes a certain action. This makes it necessary for the agent to take actions it is unfamiliar with in order to gradually learn what their effects may be. There have been several algorithms introduced to solve this problem, most notably UCRL2 . However, the regret bound achieved by UCRL2 is dependent on the diameter of the MDP. The diameter is the maximum average number of steps it takes to get from one state to another state. In MDPs where at least one state is unreachable from any of the other states, the diameter is infinite, making UCRL2 ineffective. In this project, I will build upon on already developed algorithm called REGAL. This algorithm has good provable theoretical performance and achieves a regret bound which does not depend on the diameter. However, it cannot be implemented into a real computer program because it requires knowledge of certain information which are unavailable. This report presents a modification to the algorithm which makes it implementable, while preserving the provable theoretical performance of the original algorithm.
dc.subject	reinforcement learning
dc.subject	markov decision processes
dc.title	REGAL Revisited: Regularized Reinforcement Learning for Weakly Communicating MDPs
dc.type	Project
dc.subject.hlbtoplevel	Engineering
dc.description.peerreviewed	NA
dc.contributor.affiliationum	College of Engineering
dc.contributor.affiliationumcampus	Ann Arbor
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/176692/1/REGAL_Revisited_-_Anh_Tuan_Tran.pdf
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/176692/2/REGAL_Revisited_Poster_-_Anh_Tuan_Tran.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/7541
dc.working.doi	10.7302/7541	en
dc.owningcollname	Honors Program, The College of Engineering

Files in this item

Name:: REGAL_Revisited_-_Anh_Tuan_Tran.pdf
Size:: 369.5KB
Format:: PDF
Description:: Honors Capstone final report

View/Open

Name:: REGAL_Revisited_Poster_-_Anh_T ...
Size:: 449.3KB
Format:: PDF
Description:: Honors Capstone Expo poster

View/Open

Honors Program, The College of Engineering

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.