:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Long-Fei, Qian, Yu-Yang, Zhao, Peng, Zhou, Zhi-Hua
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2502.07193
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation
by: Li, Long-Fei, et al.
Published: (2024)

RLHF Workflow: From Reward Modeling to Online RLHF
by: Dong, Hanze, et al.
Published: (2024)

Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update
by: Wang, Jing, et al.
Published: (2025)

Greedy Sampling Is Provably Efficient for RLHF
by: Wu, Di, et al.
Published: (2025)

Provably Mitigating Corruption, Overoptimization, and Verbosity Simultaneously in Offline and Online RLHF/DPO Alignment
by: Chen, Ziyi, et al.
Published: (2025)

Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs
by: Li, Long-Fei, et al.
Published: (2024)

Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition
by: Li, Long-Fei, et al.
Published: (2024)

Bias Fitting to Mitigate Length Bias of Reward Model in RLHF
by: Zhao, Kangwen, et al.
Published: (2025)

Efficient Methods for Non-stationary Online Learning
by: Zhao, Peng, et al.
Published: (2023)

Reward-Robust RLHF in LLMs
by: Yan, Yuzi, et al.
Published: (2024)

Optimal Design for Reward Modeling in RLHF
by: Scheid, Antoine, et al.
Published: (2024)

Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach
by: Yan, Yu-Hu, et al.
Published: (2023)

A Simple, Optimal and Efficient Algorithm for Online Exp-Concave Optimization
by: Wang, Yi-Han, et al.
Published: (2025)

Policy Filtration for RLHF to Mitigate Noise in Reward Models
by: Zhang, Chuheng, et al.
Published: (2024)

Reward Shaping to Mitigate Reward Hacking in RLHF
by: Fu, Jiayi, et al.
Published: (2025)

How to Evaluate Reward Models for RLHF
by: Frick, Evan, et al.
Published: (2024)

Factored Causal Representation Learning for Robust Reward Modeling in RLHF
by: Yang, Yupei, et al.
Published: (2026)

Optimistic Online-to-Batch Conversions for Accelerated Convergence and Universality
by: Yan, Yu-Hu, et al.
Published: (2025)

Learning a Pessimistic Reward Model in RLHF
by: Xu, Yinglun, et al.
Published: (2025)

Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization
by: Dai, Juntao, et al.
Published: (2025)

Reward Model Overoptimisation in Iterated RLHF
by: Wolf, Lorenz, et al.
Published: (2025)

Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
by: Huang, Jiawei, et al.
Published: (2025)

TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree
by: Qian, Yu-Yang, et al.
Published: (2025)

Information-Theoretic Reward Modeling for Stable RLHF: Detecting and Mitigating Reward Hacking
by: Miao, Yuchun, et al.
Published: (2025)

Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization
by: Zhao, Peng, et al.
Published: (2021)

Adaptivity and Universality: Problem-dependent Universal Regret for Online Convex Optimization
by: Zhao, Peng, et al.
Published: (2025)

One-Step Bellman Alignment Enables Provably Efficient Transfer in Online RL
by: Chen, Elynn, et al.
Published: (2026)

Reward Generalization in RLHF: A Topological Perspective
by: Qiu, Tianyi, et al.
Published: (2024)

Quantile Regression for Distributional Reward Models in RLHF
by: Dorka, Nicolai
Published: (2024)

Gradient-Variation Online Learning under Generalized Smoothness
by: Xie, Yan-Feng, et al.
Published: (2024)

BadReward: Clean-Label Poisoning of Reward Models in Text-to-Image RLHF
by: Duan, Kaiwen, et al.
Published: (2025)

Accelerating RLHF Training with Reward Variance Increase
by: Yang, Zonglin, et al.
Published: (2025)

It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
by: Lu, Taiming, et al.
Published: (2024)

Provably Efficient Interactive-Grounded Learning with Personalized Reward
by: Zhang, Mengxiao, et al.
Published: (2024)

ODIN: Disentangled Reward Mitigates Hacking in RLHF
by: Chen, Lichang, et al.
Published: (2024)

Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment
by: Yang, Zhiqin, et al.
Published: (2026)

A Unified Pairwise Framework for RLHF: Bridging Generative Reward Modeling and Policy Optimization
by: Xu, Wenyuan, et al.
Published: (2025)

Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update
by: Zhang, Yu-Jie, et al.
Published: (2025)

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
by: Liu, Zhihan, et al.
Published: (2024)

Robust Length Prediction: A Perspective from Heavy-Tailed Prompt-Conditioned Distributions
by: Wang, Jing, et al.
Published: (2026)