Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kim, Seok-Jin, Kim, Gi-Soo, Oh, Min-hwan
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2506.13390
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912435385925632
author	Kim, Seok-Jin Kim, Gi-Soo Oh, Min-hwan
author_facet	Kim, Seok-Jin Kim, Gi-Soo Oh, Min-hwan
contents	We study finite-armed semiparametric bandits, where each arm's reward combines a linear component with an unknown, potentially adversarial shift. This model strictly generalizes classical linear bandits and reflects complexities common in practice. We propose the first experimental-design approach that simultaneously offers a sharp regret bound, a PAC bound, and a best-arm identification guarantee. Our method attains the minimax regret $\tilde{O}(\sqrt{dT})$, matching the known lower bound for finite-armed linear bandits, and further achieves logarithmic regret under a positive suboptimality gap condition. These guarantees follow from our refined non-asymptotic analysis of orthogonalized regression that attains the optimal $\sqrt{d}$ rate, paving the way for robust and efficient learning across a broad class of semiparametric bandit problems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_13390
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Experimental Design for Semiparametric Bandits Kim, Seok-Jin Kim, Gi-Soo Oh, Min-hwan Machine Learning We study finite-armed semiparametric bandits, where each arm's reward combines a linear component with an unknown, potentially adversarial shift. This model strictly generalizes classical linear bandits and reflects complexities common in practice. We propose the first experimental-design approach that simultaneously offers a sharp regret bound, a PAC bound, and a best-arm identification guarantee. Our method attains the minimax regret $\tilde{O}(\sqrt{dT})$, matching the known lower bound for finite-armed linear bandits, and further achieves logarithmic regret under a positive suboptimality gap condition. These guarantees follow from our refined non-asymptotic analysis of orthogonalized regression that attains the optimal $\sqrt{d}$ rate, paving the way for robust and efficient learning across a broad class of semiparametric bandit problems.
title	Experimental Design for Semiparametric Bandits
topic	Machine Learning
url	https://arxiv.org/abs/2506.13390

Similar Items