:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Shengbo, Sun, Hong, Li, Ke
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2511.09047
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
by: Verma, Arun, et al.
Published: (2024)

Linear and Neural Dueling Bandits with Delayed Feedback
by: Wang, Xiangyi, et al.
Published: (2026)

Biased Dueling Bandits with Stochastic Delayed Feedback
by: Yi, Bongsoo, et al.
Published: (2024)

Fusing Reward and Dueling Feedback in Stochastic Bandits
by: Wang, Xuchuang, et al.
Published: (2025)

Active Human Feedback Collection via Neural Contextual Dueling Bandits
by: Verma, Arun, et al.
Published: (2025)

Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit
by: Huang, Tian, et al.
Published: (2023)

Federated Linear Dueling Bandits
by: Huang, Xuhan, et al.
Published: (2025)

Online Clustering of Dueling Bandits
by: Wang, Zhiyong, et al.
Published: (2025)

Conversational Dueling Bandits in Generalized Linear Models
by: Yang, Shuhua, et al.
Published: (2024)

When Can We Track Significant Preference Shifts in Dueling Bandits?
by: Suk, Joe, et al.
Published: (2023)

Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
by: Di, Qiwei, et al.
Published: (2024)

Multi-Player Approaches for Dueling Bandits
by: Raveh, Or, et al.
Published: (2024)

LLM Routing with Dueling Feedback
by: Chiang, Chao-Kai, et al.
Published: (2025)

Feel-Good Thompson Sampling for Contextual Dueling Bandits
by: Li, Xuheng, et al.
Published: (2024)

DP-Dueling: Learning from Preference Feedback without Compromising User Privacy
by: Saha, Aadirupa, et al.
Published: (2024)

The Sampling Complexity of Condorcet Winner Identification in Dueling Bandits
by: Saad, El Mehdi, et al.
Published: (2026)

Queueing Matching Bandits with Preference Feedback
by: Kim, Jung-hun, et al.
Published: (2024)

Recycling History: Efficient Recommendations from Contextual Dueling Bandits
by: Sankagiri, Suryanarayana, et al.
Published: (2025)

Utility-based Dueling Bandits as a Partial Monitoring Game
by: Gajane, Pratik, et al.
Published: (2015)

Best-of-Both-Worlds Multi-Dueling Bandits: Unified Algorithms for Stochastic and Adversarial Preferences under Condorcet and Borda Objectives
by: Akash, S, et al.
Published: (2026)

Beyond Numeric Rewards: In-Context Dueling Bandits with LLM Agents
by: Xia, Fanzeng, et al.
Published: (2024)

Lipschitz Dueling Bandits over Continuous Action Spaces
by: Sharma, Mudit, et al.
Published: (2026)

Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration
by: Oh, Youngmin, et al.
Published: (2025)

Non-Stationary Dueling Bandits Under a Weighted Borda Criterion
by: Suk, Joe, et al.
Published: (2024)

Learning from Imperfect Human Feedback: a Tale from Corruption-Robust Dueling
by: Cheng, Yuwei, et al.
Published: (2024)

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits
by: Di, Qiwei, et al.
Published: (2023)

DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback
by: Xiong, Guojun, et al.
Published: (2024)

Multi-User Dueling Bandits: A Fair Approach using Nash Social Welfare
by: Ahmed, Maheed H., et al.
Published: (2026)

Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
by: Hong, Ilgee, et al.
Published: (2024)

Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions
by: Oh, Youngmin
Published: (2026)

Bandits with Preference Feedback: A Stackelberg Game Perspective
by: Pásztor, Barna, et al.
Published: (2024)

Latent Preference Bandits
by: Mwai, Newton, et al.
Published: (2025)

"More Than Words": Linking Music Preferences and Moral Values Through Lyrics
by: Preniqi, Vjosa, et al.
Published: (2022)

Labels Matter More Than Models: Rethinking the Unsupervised Paradigm in Time Series Anomaly Detection
by: Zhong, Zhijie, et al.
Published: (2025)

Graph Feedback Bandits with Similar Arms
by: Qi, Han, et al.
Published: (2024)

Riemannian Dueling Optimization
by: Ren, Yuxuan, et al.
Published: (2026)

Learning to Play 7 Wonders Duel Without Human Supervision
by: Paolini, Giovanni, et al.
Published: (2024)

Rethinking Layer Redundancy: Calibration Matters More Than Search in LLM Depth Pruning
by: Kim, Minkyu, et al.
Published: (2026)

When Less Is More: Binary Feedback Can Outperform Ordinal Comparisons in Ranking Recovery
by: Xu, Shirong, et al.
Published: (2025)

Nearest Neighbour with Bandit Feedback
by: Pasteris, Stephen, et al.
Published: (2023)