An Asymptotically Optimal Heuristic for General Non-Stationary Finite-Horizon Restless Multi-Armed Multi-Action Bandits
dc.contributor.author | Zayas-Caban, Gabriel | |
dc.contributor.author | Jasin, Stefanus | |
dc.contributor.author | Wang, Guihua | |
dc.date.accessioned | 2017-10-25T16:05:35Z | |
dc.date.available | 2017-10-25T16:05:35Z | |
dc.date.issued | 2017-03 | |
dc.identifier | 1374 | en_US |
dc.identifier.uri | https://hdl.handle.net/2027.42/138941 | |
dc.description.abstract | We propose an asymptotically optimal heuristic, which we termed the Randomized Assignment Control (RAC) for restless multi-armed bandit problems with discrete-time and fi nite states. It is based on a linear programming relaxation to the original stochastic control formulation. In contrast to most of the existing literature, we consider a fi nite horizon with multiple actions and time-dependent (i.e. non-stationary) upper bound on the total number of bandits that can be activated each time period. The asymptotic setting is obtained by letting the number of bandits and other related parameters grow to in finity. Our main contribution is that the asymptotic optimality of RAC in this general setting does not require indexability properties or the usual stability conditions of the underlying Markov chain (e.g. unichain) or fluid approximation (e.g. global stable attractor). Moreover, our multi-action setting is not restricted to the usual dominant action concept. Numerical simulations con firms that our proposed policy indeed performs well in the asymptotic setting. Perhaps more surprisingly, these simulations show that RAC performs well in the non-asymptotic setting as well. Finally, we show that RAC is asymptotically optimal for a dynamic population, where bandits can randomly arrive and depart the system, and discuss how our framework extends to more general costs and constraints. | en_US |
dc.subject | Multi-Armed Multi-Action Bandits | en_US |
dc.subject.classification | Operations and Management Science | en_US |
dc.title | An Asymptotically Optimal Heuristic for General Non-Stationary Finite-Horizon Restless Multi-Armed Multi-Action Bandits | en_US |
dc.type | Working Paper | en_US |
dc.subject.hlbtoplevel | Business | |
dc.contributor.affiliationum | Ross School of Business | en_US |
dc.contributor.affiliationum | Center for Healthcare Engineering and Patient Safety | en_US |
dc.contributor.affiliationumcampus | Ann Arbor | |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/138941/1/1374_Jasin.pdf | |
dc.owningcollname | Business, Stephen M. Ross School of - Working Papers Series |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.