Saved in:
| Main Author: | Dorka, Nicolai |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.10164 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RLHF Workflow: From Reward Modeling to Online RLHF
by: Dong, Hanze, et al.
Published: (2024)
by: Dong, Hanze, et al.
Published: (2024)
How to Evaluate Reward Models for RLHF
by: Frick, Evan, et al.
Published: (2024)
by: Frick, Evan, et al.
Published: (2024)
Reward Model Overoptimisation in Iterated RLHF
by: Wolf, Lorenz, et al.
Published: (2025)
by: Wolf, Lorenz, et al.
Published: (2025)
Reward Shaping to Mitigate Reward Hacking in RLHF
by: Fu, Jiayi, et al.
Published: (2025)
by: Fu, Jiayi, et al.
Published: (2025)
Reward-Robust RLHF in LLMs
by: Yan, Yuzi, et al.
Published: (2024)
by: Yan, Yuzi, et al.
Published: (2024)
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
by: Lu, Taiming, et al.
Published: (2024)
by: Lu, Taiming, et al.
Published: (2024)
ODIN: Disentangled Reward Mitigates Hacking in RLHF
by: Chen, Lichang, et al.
Published: (2024)
by: Chen, Lichang, et al.
Published: (2024)
Information-Theoretic Reward Decomposition for Generalizable RLHF
by: Mao, Liyuan, et al.
Published: (2025)
by: Mao, Liyuan, et al.
Published: (2025)
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
by: Zhu, Banghua, et al.
Published: (2024)
by: Zhu, Banghua, et al.
Published: (2024)
Mitigating Reward Hacking in RLHF via Advantage Sign Robustness
by: Ono, Shinnosuke, et al.
Published: (2026)
by: Ono, Shinnosuke, et al.
Published: (2026)
CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks
by: Wang, Hao, et al.
Published: (2026)
by: Wang, Hao, et al.
Published: (2026)
Reward Generalization in RLHF: A Topological Perspective
by: Qiu, Tianyi, et al.
Published: (2024)
by: Qiu, Tianyi, et al.
Published: (2024)
RLHF in an SFT Way: From Optimal Solution to Reward-Weighted Alignment
by: Du, Yuhao, et al.
Published: (2025)
by: Du, Yuhao, et al.
Published: (2025)
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
by: Gao, Zhaolin, et al.
Published: (2024)
by: Gao, Zhaolin, et al.
Published: (2024)
Training a Vision Language Model as Smartphone Assistant
by: Dorka, Nicolai, et al.
Published: (2024)
by: Dorka, Nicolai, et al.
Published: (2024)
Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression
by: Park, Jungsoo, et al.
Published: (2026)
by: Park, Jungsoo, et al.
Published: (2026)
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
by: Hu, Jian, et al.
Published: (2024)
by: Hu, Jian, et al.
Published: (2024)
RLHF and IIA: Perverse Incentives
by: Xu, Wanqiao, et al.
Published: (2023)
by: Xu, Wanqiao, et al.
Published: (2023)
Dataset Reset Policy Optimization for RLHF
by: Chang, Jonathan D., et al.
Published: (2024)
by: Chang, Jonathan D., et al.
Published: (2024)
Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy
by: Zhu, Yu, et al.
Published: (2024)
by: Zhu, Yu, et al.
Published: (2024)
Active Preference Optimization for Sample Efficient RLHF
by: Das, Nirjhar, et al.
Published: (2024)
by: Das, Nirjhar, et al.
Published: (2024)
The Perfect Blend: Redefining RLHF with Mixture of Judges
by: Xu, Tengyu, et al.
Published: (2024)
by: Xu, Tengyu, et al.
Published: (2024)
WPO: Enhancing RLHF with Weighted Preference Optimization
by: Zhou, Wenxuan, et al.
Published: (2024)
by: Zhou, Wenxuan, et al.
Published: (2024)
Understanding the Effects of RLHF on LLM Generalisation and Diversity
by: Kirk, Robert, et al.
Published: (2023)
by: Kirk, Robert, et al.
Published: (2023)
RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation
by: Liang, Kaiqu, et al.
Published: (2025)
by: Liang, Kaiqu, et al.
Published: (2025)
General Exploratory Bonus for Optimistic Exploration in RLHF
by: Li, Wendi, et al.
Published: (2025)
by: Li, Wendi, et al.
Published: (2025)
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
by: Noukhovitch, Michael, et al.
Published: (2024)
by: Noukhovitch, Michael, et al.
Published: (2024)
DPO Meets PPO: Reinforced Token Optimization for RLHF
by: Zhong, Han, et al.
Published: (2024)
by: Zhong, Han, et al.
Published: (2024)
Adaptive Margin RLHF via Preference over Preferences
by: Chittepu, Yaswanth, et al.
Published: (2025)
by: Chittepu, Yaswanth, et al.
Published: (2025)
An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training
by: Xiao, Youshao, et al.
Published: (2023)
by: Xiao, Youshao, et al.
Published: (2023)
Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison
by: Shen, Judy Hanwen, et al.
Published: (2024)
by: Shen, Judy Hanwen, et al.
Published: (2024)
MaxMin-RLHF: Alignment with Diverse Human Preferences
by: Chakraborty, Souradip, et al.
Published: (2024)
by: Chakraborty, Souradip, et al.
Published: (2024)
FlowRL: Matching Reward Distributions for LLM Reasoning
by: Zhu, Xuekai, et al.
Published: (2025)
by: Zhu, Xuekai, et al.
Published: (2025)
RewardAnything: Generalizable Principle-Following Reward Models
by: Yu, Zhuohao, et al.
Published: (2025)
by: Yu, Zhuohao, et al.
Published: (2025)
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
by: Xie, Tengyang, et al.
Published: (2024)
by: Xie, Tengyang, et al.
Published: (2024)
Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
by: Du, Yihan, et al.
Published: (2024)
by: Du, Yihan, et al.
Published: (2024)
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
by: Dang, John, et al.
Published: (2024)
by: Dang, John, et al.
Published: (2024)
RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods
by: Sharma, Raghav, et al.
Published: (2025)
by: Sharma, Raghav, et al.
Published: (2025)
M-RewardBench: Evaluating Reward Models in Multilingual Settings
by: Gureja, Srishti, et al.
Published: (2024)
by: Gureja, Srishti, et al.
Published: (2024)
Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization
by: Kim, Sunghwan, et al.
Published: (2025)
by: Kim, Sunghwan, et al.
Published: (2025)
Similar Items
-
RLHF Workflow: From Reward Modeling to Online RLHF
by: Dong, Hanze, et al.
Published: (2024) -
How to Evaluate Reward Models for RLHF
by: Frick, Evan, et al.
Published: (2024) -
Reward Model Overoptimisation in Iterated RLHF
by: Wolf, Lorenz, et al.
Published: (2025) -
Reward Shaping to Mitigate Reward Hacking in RLHF
by: Fu, Jiayi, et al.
Published: (2025) -
Reward-Robust RLHF in LLMs
by: Yan, Yuzi, et al.
Published: (2024)