Guardado en:
| Autores principales: | Wang, Xiaobo, Wu, Tong, Tang, Min, Li, Jiaqi, Liu, Qi, Zheng, Zilong |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2605.30888 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Adaptive Preference Optimization with Uncertainty-aware Utility Anchor
por: Wang, Xiaobo, et al.
Publicado: (2025)
por: Wang, Xiaobo, et al.
Publicado: (2025)
Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback
por: Zhou, Jiayi, et al.
Publicado: (2024)
por: Zhou, Jiayi, et al.
Publicado: (2024)
RLHF Workflow: From Reward Modeling to Online RLHF
por: Dong, Hanze, et al.
Publicado: (2024)
por: Dong, Hanze, et al.
Publicado: (2024)
CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks
por: Wang, Hao, et al.
Publicado: (2026)
por: Wang, Hao, et al.
Publicado: (2026)
In-Context Editing: Learning Knowledge from Self-Induced Distributions
por: Qi, Siyuan, et al.
Publicado: (2024)
por: Qi, Siyuan, et al.
Publicado: (2024)
Reward Shaping to Mitigate Reward Hacking in RLHF
por: Fu, Jiayi, et al.
Publicado: (2025)
por: Fu, Jiayi, et al.
Publicado: (2025)
ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback
por: Hou, Zhenyu, et al.
Publicado: (2024)
por: Hou, Zhenyu, et al.
Publicado: (2024)
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
por: Lu, Taiming, et al.
Publicado: (2024)
por: Lu, Taiming, et al.
Publicado: (2024)
How to Evaluate Reward Models for RLHF
por: Frick, Evan, et al.
Publicado: (2024)
por: Frick, Evan, et al.
Publicado: (2024)
Reward-Robust RLHF in LLMs
por: Yan, Yuzi, et al.
Publicado: (2024)
por: Yan, Yuzi, et al.
Publicado: (2024)
Prototypical Reward Network for Data-Efficient RLHF
por: Zhang, Jinghan, et al.
Publicado: (2024)
por: Zhang, Jinghan, et al.
Publicado: (2024)
Reward Model Overoptimisation in Iterated RLHF
por: Wolf, Lorenz, et al.
Publicado: (2025)
por: Wolf, Lorenz, et al.
Publicado: (2025)
ARF-RLHF: Adaptive Reward-Following for RLHF through Emotion-Driven Self-Supervision and Trace-Biased Dynamic Optimization
por: Zhang, YuXuan
Publicado: (2025)
por: Zhang, YuXuan
Publicado: (2025)
LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling
por: Tang, Zecheng, et al.
Publicado: (2025)
por: Tang, Zecheng, et al.
Publicado: (2025)
RAM: Towards an Ever-Improving Memory System by Learning from Communications
por: Li, Jiaqi, et al.
Publicado: (2024)
por: Li, Jiaqi, et al.
Publicado: (2024)
Quantile Regression for Distributional Reward Models in RLHF
por: Dorka, Nicolai
Publicado: (2024)
por: Dorka, Nicolai
Publicado: (2024)
Taming Overconfidence in LLMs: Reward Calibration in RLHF
por: Leng, Jixuan, et al.
Publicado: (2024)
por: Leng, Jixuan, et al.
Publicado: (2024)
LooGLE: Can Long-Context Language Models Understand Long Contexts?
por: Li, Jiaqi, et al.
Publicado: (2023)
por: Li, Jiaqi, et al.
Publicado: (2023)
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
por: Chai, Yekun, et al.
Publicado: (2024)
por: Chai, Yekun, et al.
Publicado: (2024)
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling
por: Liu, Yang, et al.
Publicado: (2025)
por: Liu, Yang, et al.
Publicado: (2025)
Policy Improvement using Language Feedback Models
por: Zhong, Victor, et al.
Publicado: (2024)
por: Zhong, Victor, et al.
Publicado: (2024)
Reward Difference Optimization For Sample Reweighting In Offline RLHF
por: Wang, Shiqi, et al.
Publicado: (2024)
por: Wang, Shiqi, et al.
Publicado: (2024)
An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding
por: Wu, Tong, et al.
Publicado: (2024)
por: Wu, Tong, et al.
Publicado: (2024)
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
por: Yin, Yueqin, et al.
Publicado: (2025)
por: Yin, Yueqin, et al.
Publicado: (2025)
SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning
por: Ding, Yuyang, et al.
Publicado: (2025)
por: Ding, Yuyang, et al.
Publicado: (2025)
Reward Modeling from Natural Language Human Feedback
por: Wang, Zongqi, et al.
Publicado: (2026)
por: Wang, Zongqi, et al.
Publicado: (2026)
Evaluating Defences against Unsafe Feedback in RLHF
por: Rosati, Domenic, et al.
Publicado: (2024)
por: Rosati, Domenic, et al.
Publicado: (2024)
Continual SFT Matches Multimodal RLHF with Negative Supervision
por: Zhu, Ke, et al.
Publicado: (2024)
por: Zhu, Ke, et al.
Publicado: (2024)
Group Robust Preference Optimization in Reward-free RLHF
por: Ramesh, Shyam Sundhar, et al.
Publicado: (2024)
por: Ramesh, Shyam Sundhar, et al.
Publicado: (2024)
ODIN: Disentangled Reward Mitigates Hacking in RLHF
por: Chen, Lichang, et al.
Publicado: (2024)
por: Chen, Lichang, et al.
Publicado: (2024)
Information-Theoretic Reward Decomposition for Generalizable RLHF
por: Mao, Liyuan, et al.
Publicado: (2025)
por: Mao, Liyuan, et al.
Publicado: (2025)
Reward Generalization in RLHF: A Topological Perspective
por: Qiu, Tianyi, et al.
Publicado: (2024)
por: Qiu, Tianyi, et al.
Publicado: (2024)
Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings
por: Wu, Yuning, et al.
Publicado: (2026)
por: Wu, Yuning, et al.
Publicado: (2026)
Test-time Recursive Thinking: Self-Improvement without External Feedback
por: Zhuang, Yufan, et al.
Publicado: (2026)
por: Zhuang, Yufan, et al.
Publicado: (2026)
Better Process Supervision with Bi-directional Rewarding Signals
por: Chen, Wenxiang, et al.
Publicado: (2025)
por: Chen, Wenxiang, et al.
Publicado: (2025)
iFlip: Iterative Feedback-driven Counterfactual Example Refinement
por: Wang, Yilong, et al.
Publicado: (2026)
por: Wang, Yilong, et al.
Publicado: (2026)
SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models
por: Jin, Weiyang, et al.
Publicado: (2025)
por: Jin, Weiyang, et al.
Publicado: (2025)
Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise Rewards for Mathematical Reasoning
por: Xu, Huimin, et al.
Publicado: (2025)
por: Xu, Huimin, et al.
Publicado: (2025)
The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models
por: Chen, Yanjun, et al.
Publicado: (2024)
por: Chen, Yanjun, et al.
Publicado: (2024)
Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision
por: He, Yinghui, et al.
Publicado: (2026)
por: He, Yinghui, et al.
Publicado: (2026)
Ejemplares similares
-
Adaptive Preference Optimization with Uncertainty-aware Utility Anchor
por: Wang, Xiaobo, et al.
Publicado: (2025) -
Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback
por: Zhou, Jiayi, et al.
Publicado: (2024) -
RLHF Workflow: From Reward Modeling to Online RLHF
por: Dong, Hanze, et al.
Publicado: (2024) -
CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks
por: Wang, Hao, et al.
Publicado: (2026) -
In-Context Editing: Learning Knowledge from Self-Induced Distributions
por: Qi, Siyuan, et al.
Publicado: (2024)