:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Wang, Xiaobo, Wu, Tong, Tang, Min, Li, Jiaqi, Liu, Qi, Zheng, Zilong
Formato:	Preprint
Publicado:	2026
Materias:	Computation and Language
Acceso en línea:	https://arxiv.org/abs/2605.30888
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Adaptive Preference Optimization with Uncertainty-aware Utility Anchor
por: Wang, Xiaobo, et al.
Publicado: (2025)

Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback
por: Zhou, Jiayi, et al.
Publicado: (2024)

RLHF Workflow: From Reward Modeling to Online RLHF
por: Dong, Hanze, et al.
Publicado: (2024)

CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks
por: Wang, Hao, et al.
Publicado: (2026)

In-Context Editing: Learning Knowledge from Self-Induced Distributions
por: Qi, Siyuan, et al.
Publicado: (2024)

Reward Shaping to Mitigate Reward Hacking in RLHF
por: Fu, Jiayi, et al.
Publicado: (2025)

ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback
por: Hou, Zhenyu, et al.
Publicado: (2024)

It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
por: Lu, Taiming, et al.
Publicado: (2024)

How to Evaluate Reward Models for RLHF
por: Frick, Evan, et al.
Publicado: (2024)

Reward-Robust RLHF in LLMs
por: Yan, Yuzi, et al.
Publicado: (2024)

Prototypical Reward Network for Data-Efficient RLHF
por: Zhang, Jinghan, et al.
Publicado: (2024)

Reward Model Overoptimisation in Iterated RLHF
por: Wolf, Lorenz, et al.
Publicado: (2025)

ARF-RLHF: Adaptive Reward-Following for RLHF through Emotion-Driven Self-Supervision and Trace-Biased Dynamic Optimization
por: Zhang, YuXuan
Publicado: (2025)

LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling
por: Tang, Zecheng, et al.
Publicado: (2025)

RAM: Towards an Ever-Improving Memory System by Learning from Communications
por: Li, Jiaqi, et al.
Publicado: (2024)

Quantile Regression for Distributional Reward Models in RLHF
por: Dorka, Nicolai
Publicado: (2024)

Taming Overconfidence in LLMs: Reward Calibration in RLHF
por: Leng, Jixuan, et al.
Publicado: (2024)

LooGLE: Can Long-Context Language Models Understand Long Contexts?
por: Li, Jiaqi, et al.
Publicado: (2023)

MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
por: Chai, Yekun, et al.
Publicado: (2024)

RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling
por: Liu, Yang, et al.
Publicado: (2025)

Policy Improvement using Language Feedback Models
por: Zhong, Victor, et al.
Publicado: (2024)

Reward Difference Optimization For Sample Reweighting In Offline RLHF
por: Wang, Shiqi, et al.
Publicado: (2024)

An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding
por: Wu, Tong, et al.
Publicado: (2024)

Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
por: Yin, Yueqin, et al.
Publicado: (2025)

SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning
por: Ding, Yuyang, et al.
Publicado: (2025)

Reward Modeling from Natural Language Human Feedback
por: Wang, Zongqi, et al.
Publicado: (2026)

Evaluating Defences against Unsafe Feedback in RLHF
por: Rosati, Domenic, et al.
Publicado: (2024)

Continual SFT Matches Multimodal RLHF with Negative Supervision
por: Zhu, Ke, et al.
Publicado: (2024)

Group Robust Preference Optimization in Reward-free RLHF
por: Ramesh, Shyam Sundhar, et al.
Publicado: (2024)

ODIN: Disentangled Reward Mitigates Hacking in RLHF
por: Chen, Lichang, et al.
Publicado: (2024)

Information-Theoretic Reward Decomposition for Generalizable RLHF
por: Mao, Liyuan, et al.
Publicado: (2025)

Reward Generalization in RLHF: A Topological Perspective
por: Qiu, Tianyi, et al.
Publicado: (2024)

Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings
por: Wu, Yuning, et al.
Publicado: (2026)

Test-time Recursive Thinking: Self-Improvement without External Feedback
por: Zhuang, Yufan, et al.
Publicado: (2026)

Better Process Supervision with Bi-directional Rewarding Signals
por: Chen, Wenxiang, et al.
Publicado: (2025)

iFlip: Iterative Feedback-driven Counterfactual Example Refinement
por: Wang, Yilong, et al.
Publicado: (2026)

SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models
por: Jin, Weiyang, et al.
Publicado: (2025)

Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise Rewards for Mathematical Reasoning
por: Xu, Huimin, et al.
Publicado: (2025)

The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models
por: Chen, Yanjun, et al.
Publicado: (2024)

Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision
por: He, Yinghui, et al.
Publicado: (2026)