Saved in:
| Main Authors: | Chen, Ruitao, Wang, Liwei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.11226 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dual Active Learning for Reinforcement Learning from Human Feedback
by: Liu, Pangpang, et al.
Published: (2024)
by: Liu, Pangpang, et al.
Published: (2024)
Reinforcement Learning from Human Feedback with Active Queries
by: Ji, Kaixuan, et al.
Published: (2024)
by: Ji, Kaixuan, et al.
Published: (2024)
Multi-turn Reinforcement Learning from Preference Human Feedback
by: Shani, Lior, et al.
Published: (2024)
by: Shani, Lior, et al.
Published: (2024)
Reinforcement Learning from Multi-level and Episodic Human Feedback
by: Elahi, Muhammad Qasim, et al.
Published: (2025)
by: Elahi, Muhammad Qasim, et al.
Published: (2025)
Reinforcement Learning from Human Feedback
by: Lambert, Nathan
Published: (2025)
by: Lambert, Nathan
Published: (2025)
ClaHF: A Human Feedback-inspired Reinforcement Learning Framework for Improving Classification Tasks
by: Xu, Tianxiang, et al.
Published: (2026)
by: Xu, Tianxiang, et al.
Published: (2026)
Strategyproof Reinforcement Learning from Human Feedback
by: Buening, Thomas Kleine, et al.
Published: (2025)
by: Buening, Thomas Kleine, et al.
Published: (2025)
Learning to Reason from Feedback at Test-Time
by: Li, Yanyang, et al.
Published: (2025)
by: Li, Yanyang, et al.
Published: (2025)
Safe RLHF-V: Safe Reinforcement Learning from Multi-modal Human Feedback
by: Ji, Jiaming, et al.
Published: (2025)
by: Ji, Jiaming, et al.
Published: (2025)
Robust Reinforcement Learning from Corrupted Human Feedback
by: Bukharin, Alexander, et al.
Published: (2024)
by: Bukharin, Alexander, et al.
Published: (2024)
Provable Multi-Party Reinforcement Learning with Diverse Human Feedback
by: Zhong, Huiying, et al.
Published: (2024)
by: Zhong, Huiying, et al.
Published: (2024)
Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback
by: Nika, Andi, et al.
Published: (2026)
by: Nika, Andi, et al.
Published: (2026)
Continual Learning of Numerous Tasks from Long-tail Distributions
by: Kang, Liwei, et al.
Published: (2024)
by: Kang, Liwei, et al.
Published: (2024)
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
by: Swamy, Gokul, et al.
Published: (2024)
by: Swamy, Gokul, et al.
Published: (2024)
Dense Reward for Free in Reinforcement Learning from Human Feedback
by: Chan, Alex J., et al.
Published: (2024)
by: Chan, Alex J., et al.
Published: (2024)
Reinforcement Learning from Human Feedback: A Statistical Perspective
by: Liu, Pangpang, et al.
Published: (2026)
by: Liu, Pangpang, et al.
Published: (2026)
Parameter Efficient Reinforcement Learning from Human Feedback
by: Sidahmed, Hakim, et al.
Published: (2024)
by: Sidahmed, Hakim, et al.
Published: (2024)
A Survey of Reinforcement Learning from Human Feedback
by: Kaufmann, Timo, et al.
Published: (2023)
by: Kaufmann, Timo, et al.
Published: (2023)
Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback
by: Lee, Seong Jin, et al.
Published: (2024)
by: Lee, Seong Jin, et al.
Published: (2024)
Data-dependent Exploration for Online Reinforcement Learning from Human Feedback
by: Zhang, Zhen-Yu, et al.
Published: (2026)
by: Zhang, Zhen-Yu, et al.
Published: (2026)
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
by: Lambert, Nathan, et al.
Published: (2023)
by: Lambert, Nathan, et al.
Published: (2023)
Provable Reinforcement Learning from Human Feedback with an Unknown Link Function
by: Zhang, Qining, et al.
Published: (2025)
by: Zhang, Qining, et al.
Published: (2025)
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback
by: Chakraborty, Souradip, et al.
Published: (2023)
by: Chakraborty, Souradip, et al.
Published: (2023)
Towards User-level Private Reinforcement Learning with Human Feedback
by: Zhang, Jiaming, et al.
Published: (2025)
by: Zhang, Jiaming, et al.
Published: (2025)
Distributionally Robust Reinforcement Learning with Human Feedback
by: Mandal, Debmalya, et al.
Published: (2025)
by: Mandal, Debmalya, et al.
Published: (2025)
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
by: Hong, Ilgee, et al.
Published: (2024)
by: Hong, Ilgee, et al.
Published: (2024)
Corruption Robust Offline Reinforcement Learning with Human Feedback
by: Mandal, Debmalya, et al.
Published: (2024)
by: Mandal, Debmalya, et al.
Published: (2024)
Towards Off-Policy Reinforcement Learning for Ranking Policies with Human Feedback
by: Xiao, Teng, et al.
Published: (2024)
by: Xiao, Teng, et al.
Published: (2024)
Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback
by: Kim, Gihoon, et al.
Published: (2026)
by: Kim, Gihoon, et al.
Published: (2026)
Modeling Output-Level Task Relatedness in Multi-Task Learning with Feedback Mechanism
by: Xi, Xiangming, et al.
Published: (2024)
by: Xi, Xiangming, et al.
Published: (2024)
M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
by: Wang, Ziyan, et al.
Published: (2025)
by: Wang, Ziyan, et al.
Published: (2025)
Multi-Task Reinforcement Learning for Quadrotors
by: Xing, Jiaxu, et al.
Published: (2024)
by: Xing, Jiaxu, et al.
Published: (2024)
Reinforcement Learning from Denoising Feedback
by: He, Qi, et al.
Published: (2026)
by: He, Qi, et al.
Published: (2026)
Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble
by: Zhang, Shun, et al.
Published: (2024)
by: Zhang, Shun, et al.
Published: (2024)
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
by: Ye, Chenlu, et al.
Published: (2024)
by: Ye, Chenlu, et al.
Published: (2024)
TIC-GRPO: Provable and Efficient Optimization for Reinforcement Learning from Human Feedback
by: Pang, Lei, et al.
Published: (2025)
by: Pang, Lei, et al.
Published: (2025)
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
by: Shen, Wei, et al.
Published: (2025)
by: Shen, Wei, et al.
Published: (2025)
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
by: Poddar, Sriyash, et al.
Published: (2024)
by: Poddar, Sriyash, et al.
Published: (2024)
Learning Personalized Driving Styles via Reinforcement Learning from Human Feedback
by: Li, Derun, et al.
Published: (2025)
by: Li, Derun, et al.
Published: (2025)
Reinforcement Learning with Segment Feedback
by: Du, Yihan, et al.
Published: (2025)
by: Du, Yihan, et al.
Published: (2025)
Similar Items
-
Dual Active Learning for Reinforcement Learning from Human Feedback
by: Liu, Pangpang, et al.
Published: (2024) -
Reinforcement Learning from Human Feedback with Active Queries
by: Ji, Kaixuan, et al.
Published: (2024) -
Multi-turn Reinforcement Learning from Preference Human Feedback
by: Shani, Lior, et al.
Published: (2024) -
Reinforcement Learning from Multi-level and Episodic Human Feedback
by: Elahi, Muhammad Qasim, et al.
Published: (2025) -
Reinforcement Learning from Human Feedback
by: Lambert, Nathan
Published: (2025)