Saved in:
| Main Authors: | Zhang, Qining, Wei, Honghao, Ying, Lei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.07455 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
by: Zhang, Qining, et al.
Published: (2024)
by: Zhang, Qining, et al.
Published: (2024)
Provable Reinforcement Learning from Human Feedback with an Unknown Link Function
by: Zhang, Qining, et al.
Published: (2025)
by: Zhang, Qining, et al.
Published: (2025)
Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms
by: Zhang, Qining, et al.
Published: (2023)
by: Zhang, Qining, et al.
Published: (2023)
Dense Reward for Free in Reinforcement Learning from Human Feedback
by: Chan, Alex J., et al.
Published: (2024)
by: Chan, Alex J., et al.
Published: (2024)
Model-Free, Regret-Optimal Best Policy Identification in Online CMDPs
by: Zhou, Zihan, et al.
Published: (2023)
by: Zhou, Zihan, et al.
Published: (2023)
Privacy-Preserving Reinforcement Learning from Human Feedback via Decoupled Reward Modeling
by: Cho, Young Hyun, et al.
Published: (2026)
by: Cho, Young Hyun, et al.
Published: (2026)
Cost Aware Best Arm Identification
by: Kanarios, Kellen, et al.
Published: (2024)
by: Kanarios, Kellen, et al.
Published: (2024)
Efficient Federated RLHF via Zeroth-Order Policy Optimization
by: Wang, Deyi, et al.
Published: (2026)
by: Wang, Deyi, et al.
Published: (2026)
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
by: Peng, Xiyue, et al.
Published: (2024)
by: Peng, Xiyue, et al.
Published: (2024)
A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning
by: Chen, Ying-Tu, et al.
Published: (2026)
by: Chen, Ying-Tu, et al.
Published: (2026)
Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles
by: Zhai, Yuanzhao, et al.
Published: (2023)
by: Zhai, Yuanzhao, et al.
Published: (2023)
Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble
by: Zhang, Shun, et al.
Published: (2024)
by: Zhang, Shun, et al.
Published: (2024)
Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback
by: Ackermann, Johannes, et al.
Published: (2025)
by: Ackermann, Johannes, et al.
Published: (2025)
Scaling Reward Modeling without Human Supervision
by: Fan, Jingxuan, et al.
Published: (2026)
by: Fan, Jingxuan, et al.
Published: (2026)
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
by: Ye, Chenlu, et al.
Published: (2024)
by: Ye, Chenlu, et al.
Published: (2024)
Policy Learning from Large Vision-Language Model Feedback without Reward Modeling
by: Luu, Tung M., et al.
Published: (2025)
by: Luu, Tung M., et al.
Published: (2025)
Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards
by: Ackermann, Johannes, et al.
Published: (2026)
by: Ackermann, Johannes, et al.
Published: (2026)
REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback
by: Chakraborty, Souradip, et al.
Published: (2023)
by: Chakraborty, Souradip, et al.
Published: (2023)
Uncertainty Quantification for Large Language Model Reward Learning under Heterogeneous Human Feedback
by: Liu, Pangpang, et al.
Published: (2025)
by: Liu, Pangpang, et al.
Published: (2025)
Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms
by: Aggarwal, Vaneet, et al.
Published: (2024)
by: Aggarwal, Vaneet, et al.
Published: (2024)
Dual Active Learning for Reinforcement Learning from Human Feedback
by: Liu, Pangpang, et al.
Published: (2024)
by: Liu, Pangpang, et al.
Published: (2024)
Which Rewards Matter? Reward Selection for Reinforcement Learning under Limited Feedback
by: Chaudhari, Shreyas, et al.
Published: (2025)
by: Chaudhari, Shreyas, et al.
Published: (2025)
Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds
by: Huang, Jiayi, et al.
Published: (2023)
by: Huang, Jiayi, et al.
Published: (2023)
Reinforcement Learning from Human Feedback
by: Lambert, Nathan
Published: (2025)
by: Lambert, Nathan
Published: (2025)
Reinforcement Learning from Human Feedback: A Statistical Perspective
by: Liu, Pangpang, et al.
Published: (2026)
by: Liu, Pangpang, et al.
Published: (2026)
Constraints as Rewards: Reinforcement Learning for Robots without Reward Functions
by: Ishihara, Yu, et al.
Published: (2025)
by: Ishihara, Yu, et al.
Published: (2025)
Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models
by: Solway, Alec
Published: (2024)
by: Solway, Alec
Published: (2024)
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
by: Yang, Kai, et al.
Published: (2023)
by: Yang, Kai, et al.
Published: (2023)
Robust Reinforcement Learning from Corrupted Human Feedback
by: Bukharin, Alexander, et al.
Published: (2024)
by: Bukharin, Alexander, et al.
Published: (2024)
Efficient Reinforcement Learning from Human Feedback via Bayesian Preference Inference
by: Cercola, Matteo, et al.
Published: (2025)
by: Cercola, Matteo, et al.
Published: (2025)
TIC-GRPO: Provable and Efficient Optimization for Reinforcement Learning from Human Feedback
by: Pang, Lei, et al.
Published: (2025)
by: Pang, Lei, et al.
Published: (2025)
Adaptive Querying for Reward Learning from Human Feedback
by: Anand, Yashwanthi, et al.
Published: (2024)
by: Anand, Yashwanthi, et al.
Published: (2024)
Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback
by: Lee, Seong Jin, et al.
Published: (2024)
by: Lee, Seong Jin, et al.
Published: (2024)
Strategyproof Reinforcement Learning from Human Feedback
by: Buening, Thomas Kleine, et al.
Published: (2025)
by: Buening, Thomas Kleine, et al.
Published: (2025)
Towards User-level Private Reinforcement Learning with Human Feedback
by: Zhang, Jiaming, et al.
Published: (2025)
by: Zhang, Jiaming, et al.
Published: (2025)
Adversarially Trained Weighted Actor-Critic for Safe Offline Reinforcement Learning
by: Wei, Honghao, et al.
Published: (2024)
by: Wei, Honghao, et al.
Published: (2024)
Federated Learning with Instance-Dependent Noisy Label
by: Wang, Lei, et al.
Published: (2023)
by: Wang, Lei, et al.
Published: (2023)
A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning
by: Sun, Shengjie, et al.
Published: (2024)
by: Sun, Shengjie, et al.
Published: (2024)
Free Process Rewards without Process Labels
by: Yuan, Lifan, et al.
Published: (2024)
by: Yuan, Lifan, et al.
Published: (2024)
Data-dependent Exploration for Online Reinforcement Learning from Human Feedback
by: Zhang, Zhen-Yu, et al.
Published: (2026)
by: Zhang, Zhen-Yu, et al.
Published: (2026)
Similar Items
-
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
by: Zhang, Qining, et al.
Published: (2024) -
Provable Reinforcement Learning from Human Feedback with an Unknown Link Function
by: Zhang, Qining, et al.
Published: (2025) -
Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms
by: Zhang, Qining, et al.
Published: (2023) -
Dense Reward for Free in Reinforcement Learning from Human Feedback
by: Chan, Alex J., et al.
Published: (2024) -
Model-Free, Regret-Optimal Best Policy Identification in Online CMDPs
by: Zhou, Zihan, et al.
Published: (2023)