Saved in:
Bibliographic Details
Main Authors: Zeng, Qirun, Wang, Xuchuang, Shen, Jiayi, Liu, Xutong, Kong, Fang, Zuo, Jinhang
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.05745
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915987237896192
author Zeng, Qirun
Wang, Xuchuang
Shen, Jiayi
Liu, Xutong
Kong, Fang
Zuo, Jinhang
author_facet Zeng, Qirun
Wang, Xuchuang
Shen, Jiayi
Liu, Xutong
Kong, Fang
Zuo, Jinhang
contents We study fixed-confidence best arm identification in generalized linear bandits under a hybrid feedback model: at each round, the learner may query either (i) absolute reward feedback from a single arm or (ii) relative (dueling) feedback from an arm pair, both governed by generalized linear models. We introduce a likelihood-ratio--based confidence sequence that unifies heterogeneous generalized linear observations and yields an explicit ellipsoidal confidence set under a self-concordance assumption. Building on this confidence set, we propose a hybrid Track-and-Stop algorithm that adaptively allocates queries by tracking a minimax-optimal design over a joint action space of arms and pairs. We establish $δ$-correctness and provide high-probability upper bounds on the stopping time. We further extend the framework to a cost-aware setting that accounts for heterogeneous acquisition costs across feedback modalities. Empirical experiments demonstrate that the proposed algorithms significantly improve sample efficiency over baseline methods.
format Preprint
id arxiv_https___arxiv_org_abs_2605_05745
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Best Arm Identification in Generalized Linear Bandits via Hybrid Feedback
Zeng, Qirun
Wang, Xuchuang
Shen, Jiayi
Liu, Xutong
Kong, Fang
Zuo, Jinhang
Artificial Intelligence
We study fixed-confidence best arm identification in generalized linear bandits under a hybrid feedback model: at each round, the learner may query either (i) absolute reward feedback from a single arm or (ii) relative (dueling) feedback from an arm pair, both governed by generalized linear models. We introduce a likelihood-ratio--based confidence sequence that unifies heterogeneous generalized linear observations and yields an explicit ellipsoidal confidence set under a self-concordance assumption. Building on this confidence set, we propose a hybrid Track-and-Stop algorithm that adaptively allocates queries by tracking a minimax-optimal design over a joint action space of arms and pairs. We establish $δ$-correctness and provide high-probability upper bounds on the stopping time. We further extend the framework to a cost-aware setting that accounts for heterogeneous acquisition costs across feedback modalities. Empirical experiments demonstrate that the proposed algorithms significantly improve sample efficiency over baseline methods.
title Best Arm Identification in Generalized Linear Bandits via Hybrid Feedback
topic Artificial Intelligence
url https://arxiv.org/abs/2605.05745