Saved in:
| Main Authors: | Ning, Meiling, Zhang, Zhongbao, Ye, Junda, Guo, Jiabao, Guan, Qingyuan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.18212 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
R-Align: Enhancing Generative Reward Models through Rationale-Centric Meta-Judging
by: Lai, Yanlin, et al.
Published: (2026)
by: Lai, Yanlin, et al.
Published: (2026)
Checklists Are Better Than Reward Models For Aligning Language Models
by: Viswanathan, Vijay, et al.
Published: (2025)
by: Viswanathan, Vijay, et al.
Published: (2025)
Advancing Text Classification with Large Language Models and Neural Attention Mechanisms
by: Lyu, Ning, et al.
Published: (2025)
by: Lyu, Ning, et al.
Published: (2025)
The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models
by: Chen, Yanjun, et al.
Published: (2024)
by: Chen, Yanjun, et al.
Published: (2024)
Med-RewardBench: Benchmarking Reward Models and Judges for Medical Multimodal Large Language Models
by: Ding, Meidan, et al.
Published: (2025)
by: Ding, Meidan, et al.
Published: (2025)
AdaJudge: Adaptive Multi-Perspective Judging for Reward Modeling
by: Miao, Yongliang, et al.
Published: (2026)
by: Miao, Yongliang, et al.
Published: (2026)
Knowledge-Augmented Large Language Model Agents for Explainable Financial Decision-Making
by: Zhang, Qingyuan, et al.
Published: (2025)
by: Zhang, Qingyuan, et al.
Published: (2025)
Language Models that Think, Chat Better
by: Bhaskar, Adithya, et al.
Published: (2025)
by: Bhaskar, Adithya, et al.
Published: (2025)
IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation
by: Wen, Bosi, et al.
Published: (2026)
by: Wen, Bosi, et al.
Published: (2026)
Judging the Judges: Can Large Vision-Language Models Fairly Evaluate Chart Comprehension and Reasoning?
by: Laskar, Md Tahmid Rahman, et al.
Published: (2025)
by: Laskar, Md Tahmid Rahman, et al.
Published: (2025)
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models
by: Liu, Shuliang, et al.
Published: (2025)
by: Liu, Shuliang, et al.
Published: (2025)
FedJudge: Federated Legal Large Language Model
by: Yue, Linan, et al.
Published: (2023)
by: Yue, Linan, et al.
Published: (2023)
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
by: Wu, Tianhao, et al.
Published: (2024)
by: Wu, Tianhao, et al.
Published: (2024)
CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards
by: Zhang, Taolin, et al.
Published: (2025)
by: Zhang, Taolin, et al.
Published: (2025)
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do
by: Son, Guijin, et al.
Published: (2024)
by: Son, Guijin, et al.
Published: (2024)
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning
by: Park, Sungjin, et al.
Published: (2024)
by: Park, Sungjin, et al.
Published: (2024)
Energy-Based Reward Models for Robust Language Model Alignment
by: Lochab, Anamika, et al.
Published: (2025)
by: Lochab, Anamika, et al.
Published: (2025)
Beyond Scalar Reward Model: Learning Generative Judge from Preference Data
by: Ye, Ziyi, et al.
Published: (2024)
by: Ye, Ziyi, et al.
Published: (2024)
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
by: Liu, Qiyuan, et al.
Published: (2025)
by: Liu, Qiyuan, et al.
Published: (2025)
Better Process Supervision with Bi-directional Rewarding Signals
by: Chen, Wenxiang, et al.
Published: (2025)
by: Chen, Wenxiang, et al.
Published: (2025)
Debate Helps Weak Judges Reward Stronger Models
by: Elasky, Ethan, et al.
Published: (2026)
by: Elasky, Ethan, et al.
Published: (2026)
Making Large Language Models Perform Better in Knowledge Graph Completion
by: Zhang, Yichi, et al.
Published: (2023)
by: Zhang, Yichi, et al.
Published: (2023)
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
by: Zhou, Enyu, et al.
Published: (2024)
by: Zhou, Enyu, et al.
Published: (2024)
Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards
by: Wei, Xiaolong, et al.
Published: (2025)
by: Wei, Xiaolong, et al.
Published: (2025)
LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling
by: Tang, Zecheng, et al.
Published: (2025)
by: Tang, Zecheng, et al.
Published: (2025)
Continual Learning Using Only Large Language Model Prompting
by: Qiu, Jiabao, et al.
Published: (2024)
by: Qiu, Jiabao, et al.
Published: (2024)
Advancing the Robustness of Large Language Models through Self-Denoised Smoothing
by: Ji, Jiabao, et al.
Published: (2024)
by: Ji, Jiabao, et al.
Published: (2024)
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
by: Fu, Deqing, et al.
Published: (2024)
by: Fu, Deqing, et al.
Published: (2024)
Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis
by: Tao, Leitian, et al.
Published: (2025)
by: Tao, Leitian, et al.
Published: (2025)
RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models
by: Tran, Hieu, et al.
Published: (2024)
by: Tran, Hieu, et al.
Published: (2024)
S2J: Bridging the Gap Between Solving and Judging Ability in Generative Reward Models
by: Sun, Shaoning, et al.
Published: (2025)
by: Sun, Shaoning, et al.
Published: (2025)
Smaller Language Models Are Better Instruction Evolvers
by: Hui, Tingfeng, et al.
Published: (2024)
by: Hui, Tingfeng, et al.
Published: (2024)
Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference
by: Qiu, Wenjie, et al.
Published: (2025)
by: Qiu, Wenjie, et al.
Published: (2025)
Rule or Story, Which is a Better Commonsense Expression for Talking with Large Language Models?
by: Bian, Ning, et al.
Published: (2024)
by: Bian, Ning, et al.
Published: (2024)
Two Minds Better Than One: Collaborative Reward Modeling for LLM Alignment
by: Zhang, Jiazheng, et al.
Published: (2025)
by: Zhang, Jiazheng, et al.
Published: (2025)
Advancing Block Diffusion Language Models for Test-Time Scaling
by: Lu, Yi, et al.
Published: (2026)
by: Lu, Yi, et al.
Published: (2026)
Learning to Self-Verify Makes Language Models Better Reasoners
by: Chen, Yuxin, et al.
Published: (2026)
by: Chen, Yuxin, et al.
Published: (2026)
Training Language Model to Critique for Better Refinement
by: Yu, Tianshu, et al.
Published: (2025)
by: Yu, Tianshu, et al.
Published: (2025)
MT-RewardTree: A Comprehensive Framework for Advancing LLM-Based Machine Translation via Reward Modeling
by: Feng, Zhaopeng, et al.
Published: (2025)
by: Feng, Zhaopeng, et al.
Published: (2025)
MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
by: Son, Guijin, et al.
Published: (2024)
by: Son, Guijin, et al.
Published: (2024)
Similar Items
-
R-Align: Enhancing Generative Reward Models through Rationale-Centric Meta-Judging
by: Lai, Yanlin, et al.
Published: (2026) -
Checklists Are Better Than Reward Models For Aligning Language Models
by: Viswanathan, Vijay, et al.
Published: (2025) -
Advancing Text Classification with Large Language Models and Neural Attention Mechanisms
by: Lyu, Ning, et al.
Published: (2025) -
The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models
by: Chen, Yanjun, et al.
Published: (2024) -
Med-RewardBench: Benchmarking Reward Models and Judges for Medical Multimodal Large Language Models
by: Ding, Meidan, et al.
Published: (2025)