:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Dorka, Nicolai
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2409.10164
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RLHF Workflow: From Reward Modeling to Online RLHF
by: Dong, Hanze, et al.
Published: (2024)

How to Evaluate Reward Models for RLHF
by: Frick, Evan, et al.
Published: (2024)

Reward Model Overoptimisation in Iterated RLHF
by: Wolf, Lorenz, et al.
Published: (2025)

Reward Shaping to Mitigate Reward Hacking in RLHF
by: Fu, Jiayi, et al.
Published: (2025)

Reward-Robust RLHF in LLMs
by: Yan, Yuzi, et al.
Published: (2024)

It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
by: Lu, Taiming, et al.
Published: (2024)

ODIN: Disentangled Reward Mitigates Hacking in RLHF
by: Chen, Lichang, et al.
Published: (2024)

Information-Theoretic Reward Decomposition for Generalizable RLHF
by: Mao, Liyuan, et al.
Published: (2025)

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
by: Zhu, Banghua, et al.
Published: (2024)

Mitigating Reward Hacking in RLHF via Advantage Sign Robustness
by: Ono, Shinnosuke, et al.
Published: (2026)

CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks
by: Wang, Hao, et al.
Published: (2026)

Reward Generalization in RLHF: A Topological Perspective
by: Qiu, Tianyi, et al.
Published: (2024)

RLHF in an SFT Way: From Optimal Solution to Reward-Weighted Alignment
by: Du, Yuhao, et al.
Published: (2025)

Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
by: Gao, Zhaolin, et al.
Published: (2024)

Training a Vision Language Model as Smartphone Assistant
by: Dorka, Nicolai, et al.
Published: (2024)

Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression
by: Park, Jungsoo, et al.
Published: (2026)

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
by: Hu, Jian, et al.
Published: (2024)

RLHF and IIA: Perverse Incentives
by: Xu, Wanqiao, et al.
Published: (2023)

Dataset Reset Policy Optimization for RLHF
by: Chang, Jonathan D., et al.
Published: (2024)

Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy
by: Zhu, Yu, et al.
Published: (2024)

Active Preference Optimization for Sample Efficient RLHF
by: Das, Nirjhar, et al.
Published: (2024)

The Perfect Blend: Redefining RLHF with Mixture of Judges
by: Xu, Tengyu, et al.
Published: (2024)

WPO: Enhancing RLHF with Weighted Preference Optimization
by: Zhou, Wenxuan, et al.
Published: (2024)

Understanding the Effects of RLHF on LLM Generalisation and Diversity
by: Kirk, Robert, et al.
Published: (2023)

RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation
by: Liang, Kaiqu, et al.
Published: (2025)

General Exploratory Bonus for Optimistic Exploration in RLHF
by: Li, Wendi, et al.
Published: (2025)

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
by: Noukhovitch, Michael, et al.
Published: (2024)

DPO Meets PPO: Reinforced Token Optimization for RLHF
by: Zhong, Han, et al.
Published: (2024)

Adaptive Margin RLHF via Preference over Preferences
by: Chittepu, Yaswanth, et al.
Published: (2025)

An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training
by: Xiao, Youshao, et al.
Published: (2023)

Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison
by: Shen, Judy Hanwen, et al.
Published: (2024)

MaxMin-RLHF: Alignment with Diverse Human Preferences
by: Chakraborty, Souradip, et al.
Published: (2024)

FlowRL: Matching Reward Distributions for LLM Reasoning
by: Zhu, Xuekai, et al.
Published: (2025)

RewardAnything: Generalizable Principle-Following Reward Models
by: Yu, Zhuohao, et al.
Published: (2025)

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
by: Xie, Tengyang, et al.
Published: (2024)

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
by: Du, Yihan, et al.
Published: (2024)

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
by: Dang, John, et al.
Published: (2024)

RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods
by: Sharma, Raghav, et al.
Published: (2025)

M-RewardBench: Evaluating Reward Models in Multilingual Settings
by: Gureja, Srishti, et al.
Published: (2024)

Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization
by: Kim, Sunghwan, et al.
Published: (2025)