Show simple item record

REGAL Revisited: Regularized Reinforcement Learning for Weakly Communicating MDPs

dc.contributor.authorTran, A. T.
dc.contributor.advisorTewari, Ambuj
dc.date.accessioned2023-05-25T16:01:07Z
dc.date.available2023-05-25T16:01:07Z
dc.date.issued2021
dc.identifier.urihttps://hdl.handle.net/2027.42/176692
dc.description.abstractMarkov Decision Processes (MDPs) are used extensively in artificial intelligence and reinforcement learning to describe how an agent interacts with its environment. MDPs specify what possible states an agent can be in, what its possible actions are, and what the effects of taking those actions will be. By studying MDPs, we can develop techniques and algorithms which can be applied to many real-world problems where an agent is required to act intelligently. Many such algorithms have been developed with the assumption that the conditions of the MDP is known to the agent, i.e. the agent knows all the information it needs about the environment, including the possible states, actions, and effects of taking actions. This project focuses on settings where such information is unknown to the agent: it does not know how the world will behave when it takes a certain action. This makes it necessary for the agent to take actions it is unfamiliar with in order to gradually learn what their effects may be. There have been several algorithms introduced to solve this problem, most notably UCRL2 . However, the regret bound achieved by UCRL2 is dependent on the diameter of the MDP. The diameter is the maximum average number of steps it takes to get from one state to another state. In MDPs where at least one state is unreachable from any of the other states, the diameter is infinite, making UCRL2 ineffective. In this project, I will build upon on already developed algorithm called REGAL. This algorithm has good provable theoretical performance and achieves a regret bound which does not depend on the diameter. However, it cannot be implemented into a real computer program because it requires knowledge of certain information which are unavailable. This report presents a modification to the algorithm which makes it implementable, while preserving the provable theoretical performance of the original algorithm.
dc.subjectreinforcement learning
dc.subjectmarkov decision processes
dc.titleREGAL Revisited: Regularized Reinforcement Learning for Weakly Communicating MDPs
dc.typeProject
dc.subject.hlbtoplevelEngineering
dc.description.peerreviewedNA
dc.contributor.affiliationumCollege of Engineering
dc.contributor.affiliationumcampusAnn Arbor
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/176692/1/REGAL_Revisited_-_Anh_Tuan_Tran.pdf
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/176692/2/REGAL_Revisited_Poster_-_Anh_Tuan_Tran.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/7541
dc.working.doi10.7302/7541en
dc.owningcollnameHonors Program, The College of Engineering


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.