:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cheng, Yuwei, Yao, Fan, Liu, Xuefeng, Xu, Haifeng
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2405.11204
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Robust Reinforcement Learning from Corrupted Human Feedback
by: Bukharin, Alexander, et al.
Published: (2024)

Biased Dueling Bandits with Stochastic Delayed Feedback
by: Yi, Bongsoo, et al.
Published: (2024)

Corruption Robust Offline Reinforcement Learning with Human Feedback
by: Mandal, Debmalya, et al.
Published: (2024)

LLM Routing with Dueling Feedback
by: Chiang, Chao-Kai, et al.
Published: (2025)

Learning Personalized Ad Impact via Contextual Reinforcement Learning under Delayed Rewards
by: Cheng, Yuwei, et al.
Published: (2025)

Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions
by: Oh, Youngmin
Published: (2026)

Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
by: Verma, Arun, et al.
Published: (2024)

Active Human Feedback Collection via Neural Contextual Dueling Bandits
by: Verma, Arun, et al.
Published: (2025)

Fusing Reward and Dueling Feedback in Stochastic Bandits
by: Wang, Xuchuang, et al.
Published: (2025)

Preference is More Than Comparisons: Rethinking Dueling Bandits with Augmented Human Feedback
by: Wang, Shengbo, et al.
Published: (2025)

DP-Dueling: Learning from Preference Feedback without Compromising User Privacy
by: Saha, Aadirupa, et al.
Published: (2024)

Linear and Neural Dueling Bandits with Delayed Feedback
by: Wang, Xiangyi, et al.
Published: (2026)

Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
by: Di, Qiwei, et al.
Published: (2024)

Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence
by: Yi, Bingji, et al.
Published: (2025)

Out-of-Distribution Learning with Human Feedback
by: Bai, Haoyue, et al.
Published: (2024)

Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback
by: Nika, Andi, et al.
Published: (2026)

Fine-Tuning Improves Information Conveyance in Language Models
by: Cheng, Yuwei, et al.
Published: (2026)

RoDiF: Robust Direct Fine-Tuning of Diffusion Policies with Corrupted Human Feedback
by: Vatsa, Amitesh, et al.
Published: (2026)

Learning to Play 7 Wonders Duel Without Human Supervision
by: Paolini, Giovanni, et al.
Published: (2024)

Cascading Bandits Robust to Adversarial Corruptions
by: Xie, Jize, et al.
Published: (2025)

On Corruption-Robustness in Performative Reinforcement Learning
by: Pollatos, Vasilis, et al.
Published: (2025)

Federated Linear Dueling Bandits
by: Huang, Xuhan, et al.
Published: (2025)

Riemannian Dueling Optimization
by: Ren, Yuxuan, et al.
Published: (2026)

Dueling Deep Reinforcement Learning for Financial Time Series
by: Giorgio, Bruno
Published: (2025)

Sparse Offline Reinforcement Learning with Corruption Robustness
by: Tran, Nam Phuong, et al.
Published: (2025)

Online Conformal Prediction with Corrupted Feedback
by: Wang, Bowen, et al.
Published: (2026)

Robust Distribution Learning with Local and Global Adversarial Corruptions
by: Nietert, Sloan, et al.
Published: (2024)

Multi-Player Approaches for Dueling Bandits
by: Raveh, Or, et al.
Published: (2024)

Online Clustering of Dueling Bandits
by: Wang, Zhiyong, et al.
Published: (2025)

ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning
by: Liu, Zeyuan, et al.
Published: (2025)

Recycling History: Efficient Recommendations from Contextual Dueling Bandits
by: Sankagiri, Suryanarayana, et al.
Published: (2025)

Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback
by: Wang, Yikai, et al.
Published: (2026)

Online Learning to Rank under Corruption: A Robust Cascading Bandits Approach
by: Ghaffari, Fatemeh, et al.
Published: (2025)

Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption
by: Chen, Yankai, et al.
Published: (2026)

Dual Active Learning for Reinforcement Learning from Human Feedback
by: Liu, Pangpang, et al.
Published: (2024)

Corruption-Robust Offline Reinforcement Learning with General Function Approximation
by: Ye, Chenlu, et al.
Published: (2023)

A Model Selection Approach for Corruption Robust Reinforcement Learning
by: Wei, Chen-Yu, et al.
Published: (2021)

Towards Robust Offline Reinforcement Learning under Diverse Data Corruption
by: Yang, Rui, et al.
Published: (2023)

Robust Decentralized Multi-armed Bandits: From Corruption-Resilience to Byzantine-Resilience
by: Hu, Zicheng, et al.
Published: (2025)

Robust Bayesian Optimisation with Unbounded Corruptions
by: Ezzerg, Abdelhamid, et al.
Published: (2025)