Saved in:
| Main Authors: | Cheng, Yuwei, Yao, Fan, Liu, Xuefeng, Xu, Haifeng |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.11204 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Robust Reinforcement Learning from Corrupted Human Feedback
by: Bukharin, Alexander, et al.
Published: (2024)
by: Bukharin, Alexander, et al.
Published: (2024)
Biased Dueling Bandits with Stochastic Delayed Feedback
by: Yi, Bongsoo, et al.
Published: (2024)
by: Yi, Bongsoo, et al.
Published: (2024)
Corruption Robust Offline Reinforcement Learning with Human Feedback
by: Mandal, Debmalya, et al.
Published: (2024)
by: Mandal, Debmalya, et al.
Published: (2024)
LLM Routing with Dueling Feedback
by: Chiang, Chao-Kai, et al.
Published: (2025)
by: Chiang, Chao-Kai, et al.
Published: (2025)
Learning Personalized Ad Impact via Contextual Reinforcement Learning under Delayed Rewards
by: Cheng, Yuwei, et al.
Published: (2025)
by: Cheng, Yuwei, et al.
Published: (2025)
Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions
by: Oh, Youngmin
Published: (2026)
by: Oh, Youngmin
Published: (2026)
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
by: Verma, Arun, et al.
Published: (2024)
by: Verma, Arun, et al.
Published: (2024)
Active Human Feedback Collection via Neural Contextual Dueling Bandits
by: Verma, Arun, et al.
Published: (2025)
by: Verma, Arun, et al.
Published: (2025)
Fusing Reward and Dueling Feedback in Stochastic Bandits
by: Wang, Xuchuang, et al.
Published: (2025)
by: Wang, Xuchuang, et al.
Published: (2025)
Preference is More Than Comparisons: Rethinking Dueling Bandits with Augmented Human Feedback
by: Wang, Shengbo, et al.
Published: (2025)
by: Wang, Shengbo, et al.
Published: (2025)
DP-Dueling: Learning from Preference Feedback without Compromising User Privacy
by: Saha, Aadirupa, et al.
Published: (2024)
by: Saha, Aadirupa, et al.
Published: (2024)
Linear and Neural Dueling Bandits with Delayed Feedback
by: Wang, Xiangyi, et al.
Published: (2026)
by: Wang, Xiangyi, et al.
Published: (2026)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
by: Di, Qiwei, et al.
Published: (2024)
by: Di, Qiwei, et al.
Published: (2024)
Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence
by: Yi, Bingji, et al.
Published: (2025)
by: Yi, Bingji, et al.
Published: (2025)
Out-of-Distribution Learning with Human Feedback
by: Bai, Haoyue, et al.
Published: (2024)
by: Bai, Haoyue, et al.
Published: (2024)
Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback
by: Nika, Andi, et al.
Published: (2026)
by: Nika, Andi, et al.
Published: (2026)
Fine-Tuning Improves Information Conveyance in Language Models
by: Cheng, Yuwei, et al.
Published: (2026)
by: Cheng, Yuwei, et al.
Published: (2026)
RoDiF: Robust Direct Fine-Tuning of Diffusion Policies with Corrupted Human Feedback
by: Vatsa, Amitesh, et al.
Published: (2026)
by: Vatsa, Amitesh, et al.
Published: (2026)
Learning to Play 7 Wonders Duel Without Human Supervision
by: Paolini, Giovanni, et al.
Published: (2024)
by: Paolini, Giovanni, et al.
Published: (2024)
Cascading Bandits Robust to Adversarial Corruptions
by: Xie, Jize, et al.
Published: (2025)
by: Xie, Jize, et al.
Published: (2025)
On Corruption-Robustness in Performative Reinforcement Learning
by: Pollatos, Vasilis, et al.
Published: (2025)
by: Pollatos, Vasilis, et al.
Published: (2025)
Federated Linear Dueling Bandits
by: Huang, Xuhan, et al.
Published: (2025)
by: Huang, Xuhan, et al.
Published: (2025)
Riemannian Dueling Optimization
by: Ren, Yuxuan, et al.
Published: (2026)
by: Ren, Yuxuan, et al.
Published: (2026)
Dueling Deep Reinforcement Learning for Financial Time Series
by: Giorgio, Bruno
Published: (2025)
by: Giorgio, Bruno
Published: (2025)
Sparse Offline Reinforcement Learning with Corruption Robustness
by: Tran, Nam Phuong, et al.
Published: (2025)
by: Tran, Nam Phuong, et al.
Published: (2025)
Online Conformal Prediction with Corrupted Feedback
by: Wang, Bowen, et al.
Published: (2026)
by: Wang, Bowen, et al.
Published: (2026)
Robust Distribution Learning with Local and Global Adversarial Corruptions
by: Nietert, Sloan, et al.
Published: (2024)
by: Nietert, Sloan, et al.
Published: (2024)
Multi-Player Approaches for Dueling Bandits
by: Raveh, Or, et al.
Published: (2024)
by: Raveh, Or, et al.
Published: (2024)
Online Clustering of Dueling Bandits
by: Wang, Zhiyong, et al.
Published: (2025)
by: Wang, Zhiyong, et al.
Published: (2025)
ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning
by: Liu, Zeyuan, et al.
Published: (2025)
by: Liu, Zeyuan, et al.
Published: (2025)
Recycling History: Efficient Recommendations from Contextual Dueling Bandits
by: Sankagiri, Suryanarayana, et al.
Published: (2025)
by: Sankagiri, Suryanarayana, et al.
Published: (2025)
Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback
by: Wang, Yikai, et al.
Published: (2026)
by: Wang, Yikai, et al.
Published: (2026)
Online Learning to Rank under Corruption: A Robust Cascading Bandits Approach
by: Ghaffari, Fatemeh, et al.
Published: (2025)
by: Ghaffari, Fatemeh, et al.
Published: (2025)
Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption
by: Chen, Yankai, et al.
Published: (2026)
by: Chen, Yankai, et al.
Published: (2026)
Dual Active Learning for Reinforcement Learning from Human Feedback
by: Liu, Pangpang, et al.
Published: (2024)
by: Liu, Pangpang, et al.
Published: (2024)
Corruption-Robust Offline Reinforcement Learning with General Function Approximation
by: Ye, Chenlu, et al.
Published: (2023)
by: Ye, Chenlu, et al.
Published: (2023)
A Model Selection Approach for Corruption Robust Reinforcement Learning
by: Wei, Chen-Yu, et al.
Published: (2021)
by: Wei, Chen-Yu, et al.
Published: (2021)
Towards Robust Offline Reinforcement Learning under Diverse Data Corruption
by: Yang, Rui, et al.
Published: (2023)
by: Yang, Rui, et al.
Published: (2023)
Robust Decentralized Multi-armed Bandits: From Corruption-Resilience to Byzantine-Resilience
by: Hu, Zicheng, et al.
Published: (2025)
by: Hu, Zicheng, et al.
Published: (2025)
Robust Bayesian Optimisation with Unbounded Corruptions
by: Ezzerg, Abdelhamid, et al.
Published: (2025)
by: Ezzerg, Abdelhamid, et al.
Published: (2025)
Similar Items
-
Robust Reinforcement Learning from Corrupted Human Feedback
by: Bukharin, Alexander, et al.
Published: (2024) -
Biased Dueling Bandits with Stochastic Delayed Feedback
by: Yi, Bongsoo, et al.
Published: (2024) -
Corruption Robust Offline Reinforcement Learning with Human Feedback
by: Mandal, Debmalya, et al.
Published: (2024) -
LLM Routing with Dueling Feedback
by: Chiang, Chao-Kai, et al.
Published: (2025) -
Learning Personalized Ad Impact via Contextual Reinforcement Learning under Delayed Rewards
by: Cheng, Yuwei, et al.
Published: (2025)