Saved in:
| Main Authors: | Wang, Shengbo, Sun, Hong, Li, Ke |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.09047 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
by: Verma, Arun, et al.
Published: (2024)
by: Verma, Arun, et al.
Published: (2024)
Linear and Neural Dueling Bandits with Delayed Feedback
by: Wang, Xiangyi, et al.
Published: (2026)
by: Wang, Xiangyi, et al.
Published: (2026)
Biased Dueling Bandits with Stochastic Delayed Feedback
by: Yi, Bongsoo, et al.
Published: (2024)
by: Yi, Bongsoo, et al.
Published: (2024)
Fusing Reward and Dueling Feedback in Stochastic Bandits
by: Wang, Xuchuang, et al.
Published: (2025)
by: Wang, Xuchuang, et al.
Published: (2025)
Active Human Feedback Collection via Neural Contextual Dueling Bandits
by: Verma, Arun, et al.
Published: (2025)
by: Verma, Arun, et al.
Published: (2025)
Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit
by: Huang, Tian, et al.
Published: (2023)
by: Huang, Tian, et al.
Published: (2023)
Federated Linear Dueling Bandits
by: Huang, Xuhan, et al.
Published: (2025)
by: Huang, Xuhan, et al.
Published: (2025)
Online Clustering of Dueling Bandits
by: Wang, Zhiyong, et al.
Published: (2025)
by: Wang, Zhiyong, et al.
Published: (2025)
Conversational Dueling Bandits in Generalized Linear Models
by: Yang, Shuhua, et al.
Published: (2024)
by: Yang, Shuhua, et al.
Published: (2024)
When Can We Track Significant Preference Shifts in Dueling Bandits?
by: Suk, Joe, et al.
Published: (2023)
by: Suk, Joe, et al.
Published: (2023)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
by: Di, Qiwei, et al.
Published: (2024)
by: Di, Qiwei, et al.
Published: (2024)
Multi-Player Approaches for Dueling Bandits
by: Raveh, Or, et al.
Published: (2024)
by: Raveh, Or, et al.
Published: (2024)
LLM Routing with Dueling Feedback
by: Chiang, Chao-Kai, et al.
Published: (2025)
by: Chiang, Chao-Kai, et al.
Published: (2025)
Feel-Good Thompson Sampling for Contextual Dueling Bandits
by: Li, Xuheng, et al.
Published: (2024)
by: Li, Xuheng, et al.
Published: (2024)
DP-Dueling: Learning from Preference Feedback without Compromising User Privacy
by: Saha, Aadirupa, et al.
Published: (2024)
by: Saha, Aadirupa, et al.
Published: (2024)
The Sampling Complexity of Condorcet Winner Identification in Dueling Bandits
by: Saad, El Mehdi, et al.
Published: (2026)
by: Saad, El Mehdi, et al.
Published: (2026)
Queueing Matching Bandits with Preference Feedback
by: Kim, Jung-hun, et al.
Published: (2024)
by: Kim, Jung-hun, et al.
Published: (2024)
Recycling History: Efficient Recommendations from Contextual Dueling Bandits
by: Sankagiri, Suryanarayana, et al.
Published: (2025)
by: Sankagiri, Suryanarayana, et al.
Published: (2025)
Utility-based Dueling Bandits as a Partial Monitoring Game
by: Gajane, Pratik, et al.
Published: (2015)
by: Gajane, Pratik, et al.
Published: (2015)
Best-of-Both-Worlds Multi-Dueling Bandits: Unified Algorithms for Stochastic and Adversarial Preferences under Condorcet and Borda Objectives
by: Akash, S, et al.
Published: (2026)
by: Akash, S, et al.
Published: (2026)
Beyond Numeric Rewards: In-Context Dueling Bandits with LLM Agents
by: Xia, Fanzeng, et al.
Published: (2024)
by: Xia, Fanzeng, et al.
Published: (2024)
Lipschitz Dueling Bandits over Continuous Action Spaces
by: Sharma, Mudit, et al.
Published: (2026)
by: Sharma, Mudit, et al.
Published: (2026)
Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration
by: Oh, Youngmin, et al.
Published: (2025)
by: Oh, Youngmin, et al.
Published: (2025)
Non-Stationary Dueling Bandits Under a Weighted Borda Criterion
by: Suk, Joe, et al.
Published: (2024)
by: Suk, Joe, et al.
Published: (2024)
Learning from Imperfect Human Feedback: a Tale from Corruption-Robust Dueling
by: Cheng, Yuwei, et al.
Published: (2024)
by: Cheng, Yuwei, et al.
Published: (2024)
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits
by: Di, Qiwei, et al.
Published: (2023)
by: Di, Qiwei, et al.
Published: (2023)
DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback
by: Xiong, Guojun, et al.
Published: (2024)
by: Xiong, Guojun, et al.
Published: (2024)
Multi-User Dueling Bandits: A Fair Approach using Nash Social Welfare
by: Ahmed, Maheed H., et al.
Published: (2026)
by: Ahmed, Maheed H., et al.
Published: (2026)
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
by: Hong, Ilgee, et al.
Published: (2024)
by: Hong, Ilgee, et al.
Published: (2024)
Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions
by: Oh, Youngmin
Published: (2026)
by: Oh, Youngmin
Published: (2026)
Bandits with Preference Feedback: A Stackelberg Game Perspective
by: Pásztor, Barna, et al.
Published: (2024)
by: Pásztor, Barna, et al.
Published: (2024)
Latent Preference Bandits
by: Mwai, Newton, et al.
Published: (2025)
by: Mwai, Newton, et al.
Published: (2025)
"More Than Words": Linking Music Preferences and Moral Values Through Lyrics
by: Preniqi, Vjosa, et al.
Published: (2022)
by: Preniqi, Vjosa, et al.
Published: (2022)
Labels Matter More Than Models: Rethinking the Unsupervised Paradigm in Time Series Anomaly Detection
by: Zhong, Zhijie, et al.
Published: (2025)
by: Zhong, Zhijie, et al.
Published: (2025)
Graph Feedback Bandits with Similar Arms
by: Qi, Han, et al.
Published: (2024)
by: Qi, Han, et al.
Published: (2024)
Riemannian Dueling Optimization
by: Ren, Yuxuan, et al.
Published: (2026)
by: Ren, Yuxuan, et al.
Published: (2026)
Learning to Play 7 Wonders Duel Without Human Supervision
by: Paolini, Giovanni, et al.
Published: (2024)
by: Paolini, Giovanni, et al.
Published: (2024)
Rethinking Layer Redundancy: Calibration Matters More Than Search in LLM Depth Pruning
by: Kim, Minkyu, et al.
Published: (2026)
by: Kim, Minkyu, et al.
Published: (2026)
When Less Is More: Binary Feedback Can Outperform Ordinal Comparisons in Ranking Recovery
by: Xu, Shirong, et al.
Published: (2025)
by: Xu, Shirong, et al.
Published: (2025)
Nearest Neighbour with Bandit Feedback
by: Pasteris, Stephen, et al.
Published: (2023)
by: Pasteris, Stephen, et al.
Published: (2023)
Similar Items
-
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
by: Verma, Arun, et al.
Published: (2024) -
Linear and Neural Dueling Bandits with Delayed Feedback
by: Wang, Xiangyi, et al.
Published: (2026) -
Biased Dueling Bandits with Stochastic Delayed Feedback
by: Yi, Bongsoo, et al.
Published: (2024) -
Fusing Reward and Dueling Feedback in Stochastic Bandits
by: Wang, Xuchuang, et al.
Published: (2025) -
Active Human Feedback Collection via Neural Contextual Dueling Bandits
by: Verma, Arun, et al.
Published: (2025)