Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Weber, Lucas, Bušić, Ana, Zhu, Jiamin
Format:	Preprint
Publié:	2024
Sujets:	Machine Learning Optimization and Control
Accès en ligne:	https://arxiv.org/abs/2406.04766
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866913381473058816
author	Weber, Lucas Bušić, Ana Zhu, Jiamin
author_facet	Weber, Lucas Bušić, Ana Zhu, Jiamin
contents	The expected regret of any reinforcement learning algorithm is lower bounded by $Ω\left(\sqrt{DXAT}\right)$ for undiscounted returns, where $D$ is the diameter of the Markov decision process, $X$ the size of the state space, $A$ the size of the action space and $T$ the number of time steps. However, this lower bound is general. A smaller regret can be obtained by taking into account some specific knowledge of the problem structure. In this article, we consider an admission control problem to an $M/M/c/S$ queue with $m$ job classes and class-dependent rewards and holding costs. Queuing systems often have a diameter that is exponential in the buffer size $S$, making the previous lower bound prohibitive for any practical use. We propose an algorithm inspired by UCRL2, and use the structure of the problem to upper bound the expected total regret by $O(S\log T + \sqrt{mT \log T})$ in the finite server case. In the infinite server case, we prove that the dependence of the regret on $S$ disappears.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_04766
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Reinforcement Learning and Regret Bounds for Admission Control Weber, Lucas Bušić, Ana Zhu, Jiamin Machine Learning Optimization and Control The expected regret of any reinforcement learning algorithm is lower bounded by $Ω\left(\sqrt{DXAT}\right)$ for undiscounted returns, where $D$ is the diameter of the Markov decision process, $X$ the size of the state space, $A$ the size of the action space and $T$ the number of time steps. However, this lower bound is general. A smaller regret can be obtained by taking into account some specific knowledge of the problem structure. In this article, we consider an admission control problem to an $M/M/c/S$ queue with $m$ job classes and class-dependent rewards and holding costs. Queuing systems often have a diameter that is exponential in the buffer size $S$, making the previous lower bound prohibitive for any practical use. We propose an algorithm inspired by UCRL2, and use the structure of the problem to upper bound the expected total regret by $O(S\log T + \sqrt{mT \log T})$ in the finite server case. In the infinite server case, we prove that the dependence of the regret on $S$ disappears.
title	Reinforcement Learning and Regret Bounds for Admission Control
topic	Machine Learning Optimization and Control
url	https://arxiv.org/abs/2406.04766

Documents similaires