Show simple item record

On the Importance of Inherent Structural Properties for Learning in Markov Decision Processes

dc.contributor.authorAdler, Saghar
dc.date.accessioned2024-05-22T17:24:56Z
dc.date.available2024-05-22T17:24:56Z
dc.date.issued2024
dc.date.submitted2024
dc.identifier.urihttps://hdl.handle.net/2027.42/193341
dc.description.abstractRecently, reinforcement learning methodologies have been applied to solve sequential decision-making problems in various fields, such as robotics and autonomous control, communication and networking, and resource allocation and scheduling. Despite great practical success, there has been less progress in developing theoretical performance guarantees for such complex systems. This dissertation aims to address the limitations of current theoretical frameworks and extend the applicability of learning-based control methods to more complex, real-life domains discussed above. This objective is achieved in two different settings using the inherent structural properties of the Markov decision processes used to model such systems. For admission control in systems modeled by the Erlang-B blocking model with unknown arrival and service rates, in the first setting, we use model knowledge to compensate for the lack of reward signals. Here, we propose a learning algorithm based on the self-tuning adaptive control and not only prove that our algorithm is asymptotically optimal but also provide finite-time regret guarantees. The second setting develops a framework to address the challenge of applying reinforcement learning methods to Markov decision processes with countably infinite state spaces and unbounded cost functions. An existing learning algorithm based on Thompson sampling with dynamically-sized episodes is extended to countably infinite state space using the ergodicity properties of Markov decision processes. We establish asymptotic optimality of our learning-based control policy by providing a sub-linear (in time-horizon) regret guarantee. Our framework is focused on models that arise in queueing system models of communication networks, computing systems, and processing networks. Hence, to demonstrate the applicability of our method, we also apply it to the problem of controlling two queueing systems with unknown dynamics.
dc.language.isoen_US
dc.subjectReinforcement Learning
dc.subjectLearning in queueing systems
dc.titleOn the Importance of Inherent Structural Properties for Learning in Markov Decision Processes
dc.typeThesis
dc.description.thesisdegreenamePhD
dc.description.thesisdegreedisciplineElectrical and Computer Engineering
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberSubramanian, Vijay Gautam
dc.contributor.committeememberCohen, Asaf
dc.contributor.committeememberLiu, Mingyan
dc.contributor.committeememberYing, Lei
dc.subject.hlbsecondlevelElectrical Engineering
dc.subject.hlbtoplevelEngineering
dc.contributor.affiliationumcampusAnn Arbor
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/193341/1/ssaghar_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/22986
dc.identifier.orcid0009-0004-8214-0222
dc.identifier.name-orcidAdler, Saghar; 0009-0004-8214-0222en_US
dc.working.doi10.7302/22986en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.