Saved in:
Bibliographic Details
Main Authors: Jiang, Rong, Ma, Cong
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.03708
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • We study batched nonparametric contextual bandits under a margin condition when the margin parameter $α$ is unknown. To capture the statistical cost of this ignorance, we introduce the regret inflation criterion, defined as the ratio between the regret of an adaptive algorithm and that of an oracle knowing $α$. We show that the optimal regret inflation grows polynomially with the horizon $T$, with exponent given by the value of a convex optimization problem that depends on the dimension, smoothness, and number of batches $M$. Moreover, the minimizer of this optimization problem directly prescribes the batch allocation and exploration strategy of a rate-optimal algorithm. Building on this principle, we develop RoBIN (RObust batched algorithm with adaptive BINning), which achieves the optimal regret inflation up to polylogarithmic factors. These results reveal a new adaptivity barrier: under batching, adaptation to an unknown margin parameter inevitably incurs a polynomial penalty, sharply characterized by a variational problem. Remarkably, this barrier vanishes once the number of batches exceeds order $\log \log T$; with only a doubly logarithmic number of updates, one can recover the oracle regret rate up to polylogarithmic factors.