An Asymptotically Optimal Heuristic for General Non-Stationary Finite-Horizon Restless Multi-Armed Multi-Action Bandits

Zayas-Caban, Gabriel; Jasin, Stefanus; Wang, Guihua

An Asymptotically Optimal Heuristic for General Non-Stationary Finite-Horizon Restless Multi-Armed Multi-Action Bandits

dc.contributor.author	Zayas-Caban, Gabriel
dc.contributor.author	Jasin, Stefanus
dc.contributor.author	Wang, Guihua
dc.date.accessioned	2017-10-25T16:05:35Z
dc.date.available	2017-10-25T16:05:35Z
dc.date.issued	2017-03
dc.identifier	1374	en_US
dc.identifier.uri	https://hdl.handle.net/2027.42/138941
dc.description.abstract	We propose an asymptotically optimal heuristic, which we termed the Randomized Assignment Control (RAC) for restless multi-armed bandit problems with discrete-time and fi nite states. It is based on a linear programming relaxation to the original stochastic control formulation. In contrast to most of the existing literature, we consider a fi nite horizon with multiple actions and time-dependent (i.e. non-stationary) upper bound on the total number of bandits that can be activated each time period. The asymptotic setting is obtained by letting the number of bandits and other related parameters grow to in finity. Our main contribution is that the asymptotic optimality of RAC in this general setting does not require indexability properties or the usual stability conditions of the underlying Markov chain (e.g. unichain) or fluid approximation (e.g. global stable attractor). Moreover, our multi-action setting is not restricted to the usual dominant action concept. Numerical simulations con firms that our proposed policy indeed performs well in the asymptotic setting. Perhaps more surprisingly, these simulations show that RAC performs well in the non-asymptotic setting as well. Finally, we show that RAC is asymptotically optimal for a dynamic population, where bandits can randomly arrive and depart the system, and discuss how our framework extends to more general costs and constraints.	en_US
dc.subject	Multi-Armed Multi-Action Bandits	en_US
dc.subject.classification	Operations and Management Science	en_US
dc.title	An Asymptotically Optimal Heuristic for General Non-Stationary Finite-Horizon Restless Multi-Armed Multi-Action Bandits	en_US
dc.type	Working Paper	en_US
dc.subject.hlbtoplevel	Business
dc.contributor.affiliationum	Ross School of Business	en_US
dc.contributor.affiliationum	Center for Healthcare Engineering and Patient Safety	en_US
dc.contributor.affiliationumcampus	Ann Arbor
dc.description.bitstreamurl	https://deepblue.lib.umich.edu/bitstream/2027.42/138941/1/1374_Jasin.pdf
dc.owningcollname	Business, Stephen M. Ross School of - Working Papers Series

Files in this item

Name:: 1374_Jasin.pdf
Size:: 380.1KB
Format:: PDF

View/Open

Business, Stephen M. Ross School of - Working Papers Series

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.