Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.05745 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866915987237896192 |
|---|---|
| author | Zeng, Qirun Wang, Xuchuang Shen, Jiayi Liu, Xutong Kong, Fang Zuo, Jinhang |
| author_facet | Zeng, Qirun Wang, Xuchuang Shen, Jiayi Liu, Xutong Kong, Fang Zuo, Jinhang |
| contents | We study fixed-confidence best arm identification in generalized linear bandits under a hybrid feedback model: at each round, the learner may query either (i) absolute reward feedback from a single arm or (ii) relative (dueling) feedback from an arm pair, both governed by generalized linear models. We introduce a likelihood-ratio--based confidence sequence that unifies heterogeneous generalized linear observations and yields an explicit ellipsoidal confidence set under a self-concordance assumption. Building on this confidence set, we propose a hybrid Track-and-Stop algorithm that adaptively allocates queries by tracking a minimax-optimal design over a joint action space of arms and pairs. We establish $δ$-correctness and provide high-probability upper bounds on the stopping time. We further extend the framework to a cost-aware setting that accounts for heterogeneous acquisition costs across feedback modalities. Empirical experiments demonstrate that the proposed algorithms significantly improve sample efficiency over baseline methods. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2605_05745 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Best Arm Identification in Generalized Linear Bandits via Hybrid Feedback Zeng, Qirun Wang, Xuchuang Shen, Jiayi Liu, Xutong Kong, Fang Zuo, Jinhang Artificial Intelligence We study fixed-confidence best arm identification in generalized linear bandits under a hybrid feedback model: at each round, the learner may query either (i) absolute reward feedback from a single arm or (ii) relative (dueling) feedback from an arm pair, both governed by generalized linear models. We introduce a likelihood-ratio--based confidence sequence that unifies heterogeneous generalized linear observations and yields an explicit ellipsoidal confidence set under a self-concordance assumption. Building on this confidence set, we propose a hybrid Track-and-Stop algorithm that adaptively allocates queries by tracking a minimax-optimal design over a joint action space of arms and pairs. We establish $δ$-correctness and provide high-probability upper bounds on the stopping time. We further extend the framework to a cost-aware setting that accounts for heterogeneous acquisition costs across feedback modalities. Empirical experiments demonstrate that the proposed algorithms significantly improve sample efficiency over baseline methods. |
| title | Best Arm Identification in Generalized Linear Bandits via Hybrid Feedback |
| topic | Artificial Intelligence |
| url | https://arxiv.org/abs/2605.05745 |