Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Montanari, Giovanni, Fiegel, Côme, Pla, Corentin, Saha, Aadirupa, Perchet, Vianney
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Computer Science and Game Theory
Online Access:	https://arxiv.org/abs/2602.06565
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908817873174528
author	Montanari, Giovanni Fiegel, Côme Pla, Corentin Saha, Aadirupa Perchet, Vianney
author_facet	Montanari, Giovanni Fiegel, Côme Pla, Corentin Saha, Aadirupa Perchet, Vianney
contents	We study the online resource allocation problem in which at each round, a budget $B$ must be allocated across $K$ arms under censored feedback. An arm yields a reward if and only if two conditions are satisfied: (i) the arm is activated according to an arm-specific Bernoulli random variable with unknown parameter, and (ii) the allocated budget exceeds a random threshold drawn from a parametric distribution with unknown parameter. Over $T$ rounds, the learner must jointly estimate the unknown parameters and allocate the budget so as to maximize cumulative reward facing the exploration--exploitation trade-off. We prove an information-theoretic regret lower bound $Ω(T^{1/3})$, demonstrating the intrinsic difficulty of the problem. We then propose RA-UCB, an optimistic algorithm that leverages non-trivial parameter estimation and confidence bounds. When the budget $B$ is known at the beginning of each round, RA-UCB achieves a regret of order $\widetilde{\mathcal{O}}(\sqrt{T})$, and even $\mathcal{O}(\mathrm{poly}\text{-}\log T)$ under stronger assumptions. As for unknown, round dependent budget, we introduce MG-UCB, which allows within-round switching and infinitesimal allocations, and matches the regret guarantees of RA-UCB. We then validate our theoretical results through experiments on real-world datasets.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_06565
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Learning to Allocate Resources with Censored Feedback Montanari, Giovanni Fiegel, Côme Pla, Corentin Saha, Aadirupa Perchet, Vianney Machine Learning Computer Science and Game Theory We study the online resource allocation problem in which at each round, a budget $B$ must be allocated across $K$ arms under censored feedback. An arm yields a reward if and only if two conditions are satisfied: (i) the arm is activated according to an arm-specific Bernoulli random variable with unknown parameter, and (ii) the allocated budget exceeds a random threshold drawn from a parametric distribution with unknown parameter. Over $T$ rounds, the learner must jointly estimate the unknown parameters and allocate the budget so as to maximize cumulative reward facing the exploration--exploitation trade-off. We prove an information-theoretic regret lower bound $Ω(T^{1/3})$, demonstrating the intrinsic difficulty of the problem. We then propose RA-UCB, an optimistic algorithm that leverages non-trivial parameter estimation and confidence bounds. When the budget $B$ is known at the beginning of each round, RA-UCB achieves a regret of order $\widetilde{\mathcal{O}}(\sqrt{T})$, and even $\mathcal{O}(\mathrm{poly}\text{-}\log T)$ under stronger assumptions. As for unknown, round dependent budget, we introduce MG-UCB, which allows within-round switching and infinitesimal allocations, and matches the regret guarantees of RA-UCB. We then validate our theoretical results through experiments on real-world datasets.
title	Learning to Allocate Resources with Censored Feedback
topic	Machine Learning Computer Science and Game Theory
url	https://arxiv.org/abs/2602.06565

Similar Items