Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zeng, Qirun, Wang, Xuchuang, Shen, Jiayi, Liu, Xutong, Kong, Fang, Zuo, Jinhang
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.05745
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915987237896192
author	Zeng, Qirun Wang, Xuchuang Shen, Jiayi Liu, Xutong Kong, Fang Zuo, Jinhang
author_facet	Zeng, Qirun Wang, Xuchuang Shen, Jiayi Liu, Xutong Kong, Fang Zuo, Jinhang
contents	We study fixed-confidence best arm identification in generalized linear bandits under a hybrid feedback model: at each round, the learner may query either (i) absolute reward feedback from a single arm or (ii) relative (dueling) feedback from an arm pair, both governed by generalized linear models. We introduce a likelihood-ratio--based confidence sequence that unifies heterogeneous generalized linear observations and yields an explicit ellipsoidal confidence set under a self-concordance assumption. Building on this confidence set, we propose a hybrid Track-and-Stop algorithm that adaptively allocates queries by tracking a minimax-optimal design over a joint action space of arms and pairs. We establish $δ$-correctness and provide high-probability upper bounds on the stopping time. We further extend the framework to a cost-aware setting that accounts for heterogeneous acquisition costs across feedback modalities. Empirical experiments demonstrate that the proposed algorithms significantly improve sample efficiency over baseline methods.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_05745
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Best Arm Identification in Generalized Linear Bandits via Hybrid Feedback Zeng, Qirun Wang, Xuchuang Shen, Jiayi Liu, Xutong Kong, Fang Zuo, Jinhang Artificial Intelligence We study fixed-confidence best arm identification in generalized linear bandits under a hybrid feedback model: at each round, the learner may query either (i) absolute reward feedback from a single arm or (ii) relative (dueling) feedback from an arm pair, both governed by generalized linear models. We introduce a likelihood-ratio--based confidence sequence that unifies heterogeneous generalized linear observations and yields an explicit ellipsoidal confidence set under a self-concordance assumption. Building on this confidence set, we propose a hybrid Track-and-Stop algorithm that adaptively allocates queries by tracking a minimax-optimal design over a joint action space of arms and pairs. We establish $δ$-correctness and provide high-probability upper bounds on the stopping time. We further extend the framework to a cost-aware setting that accounts for heterogeneous acquisition costs across feedback modalities. Empirical experiments demonstrate that the proposed algorithms significantly improve sample efficiency over baseline methods.
title	Best Arm Identification in Generalized Linear Bandits via Hybrid Feedback
topic	Artificial Intelligence
url	https://arxiv.org/abs/2605.05745

Similar Items