Saved in:
Bibliographic Details
Main Authors: Montanari, Giovanni, Fiegel, Côme, Pla, Corentin, Saha, Aadirupa, Perchet, Vianney
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.06565
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908817873174528
author Montanari, Giovanni
Fiegel, Côme
Pla, Corentin
Saha, Aadirupa
Perchet, Vianney
author_facet Montanari, Giovanni
Fiegel, Côme
Pla, Corentin
Saha, Aadirupa
Perchet, Vianney
contents We study the online resource allocation problem in which at each round, a budget $B$ must be allocated across $K$ arms under censored feedback. An arm yields a reward if and only if two conditions are satisfied: (i) the arm is activated according to an arm-specific Bernoulli random variable with unknown parameter, and (ii) the allocated budget exceeds a random threshold drawn from a parametric distribution with unknown parameter. Over $T$ rounds, the learner must jointly estimate the unknown parameters and allocate the budget so as to maximize cumulative reward facing the exploration--exploitation trade-off. We prove an information-theoretic regret lower bound $Ω(T^{1/3})$, demonstrating the intrinsic difficulty of the problem. We then propose RA-UCB, an optimistic algorithm that leverages non-trivial parameter estimation and confidence bounds. When the budget $B$ is known at the beginning of each round, RA-UCB achieves a regret of order $\widetilde{\mathcal{O}}(\sqrt{T})$, and even $\mathcal{O}(\mathrm{poly}\text{-}\log T)$ under stronger assumptions. As for unknown, round dependent budget, we introduce MG-UCB, which allows within-round switching and infinitesimal allocations, and matches the regret guarantees of RA-UCB. We then validate our theoretical results through experiments on real-world datasets.
format Preprint
id arxiv_https___arxiv_org_abs_2602_06565
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Learning to Allocate Resources with Censored Feedback
Montanari, Giovanni
Fiegel, Côme
Pla, Corentin
Saha, Aadirupa
Perchet, Vianney
Machine Learning
Computer Science and Game Theory
We study the online resource allocation problem in which at each round, a budget $B$ must be allocated across $K$ arms under censored feedback. An arm yields a reward if and only if two conditions are satisfied: (i) the arm is activated according to an arm-specific Bernoulli random variable with unknown parameter, and (ii) the allocated budget exceeds a random threshold drawn from a parametric distribution with unknown parameter. Over $T$ rounds, the learner must jointly estimate the unknown parameters and allocate the budget so as to maximize cumulative reward facing the exploration--exploitation trade-off. We prove an information-theoretic regret lower bound $Ω(T^{1/3})$, demonstrating the intrinsic difficulty of the problem. We then propose RA-UCB, an optimistic algorithm that leverages non-trivial parameter estimation and confidence bounds. When the budget $B$ is known at the beginning of each round, RA-UCB achieves a regret of order $\widetilde{\mathcal{O}}(\sqrt{T})$, and even $\mathcal{O}(\mathrm{poly}\text{-}\log T)$ under stronger assumptions. As for unknown, round dependent budget, we introduce MG-UCB, which allows within-round switching and infinitesimal allocations, and matches the regret guarantees of RA-UCB. We then validate our theoretical results through experiments on real-world datasets.
title Learning to Allocate Resources with Censored Feedback
topic Machine Learning
Computer Science and Game Theory
url https://arxiv.org/abs/2602.06565