Saved in:
| Main Authors: | Xu, Wenyuan, Zuo, Xiaochen, Xin, Chao, Yue, Yu, Yan, Lin, Wu, Yonghui |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.04950 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reward-Robust RLHF in LLMs
by: Yan, Yuzi, et al.
Published: (2024)
by: Yan, Yuzi, et al.
Published: (2024)
Policy Filtration for RLHF to Mitigate Noise in Reward Models
by: Zhang, Chuheng, et al.
Published: (2024)
by: Zhang, Chuheng, et al.
Published: (2024)
PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier
by: Jiang, Yuhua, et al.
Published: (2025)
by: Jiang, Yuhua, et al.
Published: (2025)
Policy Optimization in RLHF: The Impact of Out-of-preference Data
by: Li, Ziniu, et al.
Published: (2023)
by: Li, Ziniu, et al.
Published: (2023)
Factored Causal Representation Learning for Robust Reward Modeling in RLHF
by: Yang, Yupei, et al.
Published: (2026)
by: Yang, Yupei, et al.
Published: (2026)
Learning a Pessimistic Reward Model in RLHF
by: Xu, Yinglun, et al.
Published: (2025)
by: Xu, Yinglun, et al.
Published: (2025)
Optimal Design for Reward Modeling in RLHF
by: Scheid, Antoine, et al.
Published: (2024)
by: Scheid, Antoine, et al.
Published: (2024)
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
by: Lu, Taiming, et al.
Published: (2024)
by: Lu, Taiming, et al.
Published: (2024)
RLHF Workflow: From Reward Modeling to Online RLHF
by: Dong, Hanze, et al.
Published: (2024)
by: Dong, Hanze, et al.
Published: (2024)
Reward Generalization in RLHF: A Topological Perspective
by: Qiu, Tianyi, et al.
Published: (2024)
by: Qiu, Tianyi, et al.
Published: (2024)
How to Evaluate Reward Models for RLHF
by: Frick, Evan, et al.
Published: (2024)
by: Frick, Evan, et al.
Published: (2024)
Unifying Stable Optimization and Reference Regularization in RLHF
by: He, Li, et al.
Published: (2026)
by: He, Li, et al.
Published: (2026)
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
by: Huang, Jiawei, et al.
Published: (2025)
by: Huang, Jiawei, et al.
Published: (2025)
Beyond RLHF: A Unified Theoretical Framework of Alignment
by: Yun, Jihun, et al.
Published: (2025)
by: Yun, Jihun, et al.
Published: (2025)
Provably Efficient Online RLHF with One-Pass Reward Modeling
by: Li, Long-Fei, et al.
Published: (2025)
by: Li, Long-Fei, et al.
Published: (2025)
Group Robust Preference Optimization in Reward-free RLHF
by: Ramesh, Shyam Sundhar, et al.
Published: (2024)
by: Ramesh, Shyam Sundhar, et al.
Published: (2024)
On the Exponential Convergence for Offline RLHF with Pairwise Comparisons
by: Chen, Zhirui, et al.
Published: (2024)
by: Chen, Zhirui, et al.
Published: (2024)
A Theoretical Framework for Partially Observed Reward-States in RLHF
by: Kausik, Chinmaya, et al.
Published: (2024)
by: Kausik, Chinmaya, et al.
Published: (2024)
Reward Model Overoptimisation in Iterated RLHF
by: Wolf, Lorenz, et al.
Published: (2025)
by: Wolf, Lorenz, et al.
Published: (2025)
Information-Theoretic Reward Modeling for Stable RLHF: Detecting and Mitigating Reward Hacking
by: Miao, Yuchun, et al.
Published: (2025)
by: Miao, Yuchun, et al.
Published: (2025)
Dataset Reset Policy Optimization for RLHF
by: Chang, Jonathan D., et al.
Published: (2024)
by: Chang, Jonathan D., et al.
Published: (2024)
Reward Shaping to Mitigate Reward Hacking in RLHF
by: Fu, Jiayi, et al.
Published: (2025)
by: Fu, Jiayi, et al.
Published: (2025)
Quantile Regression for Distributional Reward Models in RLHF
by: Dorka, Nicolai
Published: (2024)
by: Dorka, Nicolai
Published: (2024)
Information-Theoretic Reward Decomposition for Generalizable RLHF
by: Mao, Liyuan, et al.
Published: (2025)
by: Mao, Liyuan, et al.
Published: (2025)
Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models
by: Wang, Austin, et al.
Published: (2026)
by: Wang, Austin, et al.
Published: (2026)
Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization
by: Dai, Juntao, et al.
Published: (2025)
by: Dai, Juntao, et al.
Published: (2025)
One Framework to Rule Them All: Unifying RL-Based and RL-Free Methods in RLHF
by: Cai, Xin
Published: (2025)
by: Cai, Xin
Published: (2025)
UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function
by: Wang, Zhichao, et al.
Published: (2024)
by: Wang, Zhichao, et al.
Published: (2024)
Circuit-Aware Reward Training: A Mechanistic Framework for Longtail Robustness in RLHF
by: Liu, Jing
Published: (2025)
by: Liu, Jing
Published: (2025)
Efficient Federated RLHF via Zeroth-Order Policy Optimization
by: Wang, Deyi, et al.
Published: (2026)
by: Wang, Deyi, et al.
Published: (2026)
Robust Post-Training for Generative Recommenders: Why Exponential Reward-Weighted SFT Outperforms RLHF
by: Chidambaram, Keertana, et al.
Published: (2026)
by: Chidambaram, Keertana, et al.
Published: (2026)
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
by: Hu, Jian, et al.
Published: (2024)
by: Hu, Jian, et al.
Published: (2024)
Bias Fitting to Mitigate Length Bias of Reward Model in RLHF
by: Zhao, Kangwen, et al.
Published: (2025)
by: Zhao, Kangwen, et al.
Published: (2025)
InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling
by: Miao, Yuchun, et al.
Published: (2024)
by: Miao, Yuchun, et al.
Published: (2024)
Projection Optimization: A General Framework for Multi-Objective and Multi-Group RLHF
by: Xiong, Nuoya, et al.
Published: (2025)
by: Xiong, Nuoya, et al.
Published: (2025)
Policy Optimization Algorithms in a Unified Framework
by: Wu, Shuang
Published: (2025)
by: Wu, Shuang
Published: (2025)
BadReward: Clean-Label Poisoning of Reward Models in Text-to-Image RLHF
by: Duan, Kaiwen, et al.
Published: (2025)
by: Duan, Kaiwen, et al.
Published: (2025)
DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization
by: Li, Gang, et al.
Published: (2025)
by: Li, Gang, et al.
Published: (2025)
A First-Order Logic-Based Alternative to Reward Models in RLHF
by: Jian, Chunjin, et al.
Published: (2025)
by: Jian, Chunjin, et al.
Published: (2025)
Towards a Theoretical Understanding to the Generalization of RLHF
by: Li, Zhaochun, et al.
Published: (2026)
by: Li, Zhaochun, et al.
Published: (2026)
Similar Items
-
Reward-Robust RLHF in LLMs
by: Yan, Yuzi, et al.
Published: (2024) -
Policy Filtration for RLHF to Mitigate Noise in Reward Models
by: Zhang, Chuheng, et al.
Published: (2024) -
PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier
by: Jiang, Yuhua, et al.
Published: (2025) -
Policy Optimization in RLHF: The Impact of Out-of-preference Data
by: Li, Ziniu, et al.
Published: (2023) -
Factored Causal Representation Learning for Robust Reward Modeling in RLHF
by: Yang, Yupei, et al.
Published: (2026)