Conference Proceedings

Towards Q-learning the Whittle Index for Restless Bandits

Jing Fu, Yoni Nazarathy, Sarat Moka, Peter G Taylor

2019 AUSTRALIAN & NEW ZEALAND CONTROL CONFERENCE (ANZCC) | IEEE | Published : 2019

Abstract

We consider the multi-armed restless bandit problem (RMABP) with an infinite horizon average cost objective. Each arm of the RMABP is associated with a Markov process that operates in two modes: active and passive. At each time slot a controller needs to designate a subset of the arms to be active, of which the associated processes will evolve differently from the passive case. Treated as an optimal control problem, the optimal solution of the RMABP is known to be computationally intractable. In many cases, the Whittle index policy achieves near optimal performance and can be tractably found. Nevertheless, computation of the Whittle indices requires knowledge of the transition matrices of th..

View full abstract

Grants

Awarded by Australian Research Council (ARC)


Awarded by ACEMS


Awarded by ARC


Funding Acknowledgements

J. Fu and P.G. Taylor's research is supported by the Australian Research Council (ARC) Laureate Fellowship FL130100039 and the ARC Centre of Excellence for the Mathematical and Statistical Frontiers (ACEMS). S. Moka's research is supported by ACEMS, under grant number CE140100049. Y. Nazarathy's research is supported by ARC grant DP180101602. The authors also thank Prof. Vivek Borkar for preliminary discussions.