Enregistré dans:
Détails bibliographiques
Auteurs principaux: Weber, Lucas, Bušić, Ana, Zhu, Jiamin
Format: Preprint
Publié: 2024
Sujets:
Accès en ligne:https://arxiv.org/abs/2406.04766
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866913381473058816
author Weber, Lucas
Bušić, Ana
Zhu, Jiamin
author_facet Weber, Lucas
Bušić, Ana
Zhu, Jiamin
contents The expected regret of any reinforcement learning algorithm is lower bounded by $Ω\left(\sqrt{DXAT}\right)$ for undiscounted returns, where $D$ is the diameter of the Markov decision process, $X$ the size of the state space, $A$ the size of the action space and $T$ the number of time steps. However, this lower bound is general. A smaller regret can be obtained by taking into account some specific knowledge of the problem structure. In this article, we consider an admission control problem to an $M/M/c/S$ queue with $m$ job classes and class-dependent rewards and holding costs. Queuing systems often have a diameter that is exponential in the buffer size $S$, making the previous lower bound prohibitive for any practical use. We propose an algorithm inspired by UCRL2, and use the structure of the problem to upper bound the expected total regret by $O(S\log T + \sqrt{mT \log T})$ in the finite server case. In the infinite server case, we prove that the dependence of the regret on $S$ disappears.
format Preprint
id arxiv_https___arxiv_org_abs_2406_04766
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Reinforcement Learning and Regret Bounds for Admission Control
Weber, Lucas
Bušić, Ana
Zhu, Jiamin
Machine Learning
Optimization and Control
The expected regret of any reinforcement learning algorithm is lower bounded by $Ω\left(\sqrt{DXAT}\right)$ for undiscounted returns, where $D$ is the diameter of the Markov decision process, $X$ the size of the state space, $A$ the size of the action space and $T$ the number of time steps. However, this lower bound is general. A smaller regret can be obtained by taking into account some specific knowledge of the problem structure. In this article, we consider an admission control problem to an $M/M/c/S$ queue with $m$ job classes and class-dependent rewards and holding costs. Queuing systems often have a diameter that is exponential in the buffer size $S$, making the previous lower bound prohibitive for any practical use. We propose an algorithm inspired by UCRL2, and use the structure of the problem to upper bound the expected total regret by $O(S\log T + \sqrt{mT \log T})$ in the finite server case. In the infinite server case, we prove that the dependence of the regret on $S$ disappears.
title Reinforcement Learning and Regret Bounds for Admission Control
topic Machine Learning
Optimization and Control
url https://arxiv.org/abs/2406.04766