Saved in:
| Main Authors: | Pisano, Raffaele, Navigli, Roberto |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.17957 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Truncated Step-Level Sampling with Process Rewards for Retrieval-Augmented Reasoning
by: Samarinas, Chris, et al.
Published: (2026)
by: Samarinas, Chris, et al.
Published: (2026)
Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis
by: Qiu, Zhisong, et al.
Published: (2026)
by: Qiu, Zhisong, et al.
Published: (2026)
Uncertainty-Aware Step-wise Verification with Generative Reward Models
by: Ye, Zihuiwen, et al.
Published: (2025)
by: Ye, Zihuiwen, et al.
Published: (2025)
Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models
by: Nie, Shuo, et al.
Published: (2026)
by: Nie, Shuo, et al.
Published: (2026)
The Bidirectional Process Reward Model
by: Zhang, Lingyin, et al.
Published: (2025)
by: Zhang, Lingyin, et al.
Published: (2025)
Process Reward Models for Sentence-Level Verification of LVLM Radiology Reports
by: Thomas, Alois, et al.
Published: (2025)
by: Thomas, Alois, et al.
Published: (2025)
Dynamic and Generalizable Process Reward Modeling
by: Yin, Zhangyue, et al.
Published: (2025)
by: Yin, Zhangyue, et al.
Published: (2025)
Process-Supervised Reward Models for Verifying Clinical Note Generation: A Scalable Approach Guided by Domain Expertise
by: Wang, Hanyin, et al.
Published: (2024)
by: Wang, Hanyin, et al.
Published: (2024)
GRAM: A Generative Foundation Reward Model for Reward Generalization
by: Wang, Chenglong, et al.
Published: (2025)
by: Wang, Chenglong, et al.
Published: (2025)
Agentic Reinforcement Learning with Implicit Step Rewards
by: Liu, Xiaoqian, et al.
Published: (2025)
by: Liu, Xiaoqian, et al.
Published: (2025)
FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction
by: Scirè, Alessandro, et al.
Published: (2024)
by: Scirè, Alessandro, et al.
Published: (2024)
Process Reward Models That Think
by: Khalifa, Muhammad, et al.
Published: (2025)
by: Khalifa, Muhammad, et al.
Published: (2025)
Entropy-Regularized Process Reward Model
by: Zhang, Hanning, et al.
Published: (2024)
by: Zhang, Hanning, et al.
Published: (2024)
Discriminative Policy Optimization for Token-Level Reward Models
by: Chen, Hongzhan, et al.
Published: (2025)
by: Chen, Hongzhan, et al.
Published: (2025)
Long-form RewardBench: Evaluating Reward Models for Long-form Generation
by: Huang, Hui, et al.
Published: (2026)
by: Huang, Hui, et al.
Published: (2026)
IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning
by: Liang, Zihan, et al.
Published: (2026)
by: Liang, Zihan, et al.
Published: (2026)
Reward Model Perspectives: Whose Opinions Do Reward Models Reward?
by: Elle
Published: (2025)
by: Elle
Published: (2025)
OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment
by: Liu, Tianci, et al.
Published: (2025)
by: Liu, Tianci, et al.
Published: (2025)
Scalable Ensembling For Mitigating Reward Overoptimisation
by: Ahmed, Ahmed M., et al.
Published: (2024)
by: Ahmed, Ahmed M., et al.
Published: (2024)
MUSIC: MUlti-Step Instruction Contrast for Multi-Turn Reward Models
by: Li, Wenzhe, et al.
Published: (2025)
by: Li, Wenzhe, et al.
Published: (2025)
An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning
by: Sun, Wei, et al.
Published: (2025)
by: Sun, Wei, et al.
Published: (2025)
Interpretable Coreference Resolution Evaluation Using Explicit Semantics
by: Gatti, Bruno, et al.
Published: (2026)
by: Gatti, Bruno, et al.
Published: (2026)
Has Machine Translation Evaluation Achieved Human Parity? The Human Reference and the Limits of Progress
by: Proietti, Lorenzo, et al.
Published: (2025)
by: Proietti, Lorenzo, et al.
Published: (2025)
LiteraryQA: Towards Effective Evaluation of Long-document Narrative QA
by: Bonomo, Tommaso, et al.
Published: (2025)
by: Bonomo, Tommaso, et al.
Published: (2025)
Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends
by: Martinelli, Giuliano, et al.
Published: (2024)
by: Martinelli, Giuliano, et al.
Published: (2024)
Multi-Turn Code Generation Through Single-Step Rewards
by: Jain, Arnav Kumar, et al.
Published: (2025)
by: Jain, Arnav Kumar, et al.
Published: (2025)
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing
by: Jiao, Fangkai, et al.
Published: (2024)
by: Jiao, Fangkai, et al.
Published: (2024)
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
by: Song, Mingyang, et al.
Published: (2025)
by: Song, Mingyang, et al.
Published: (2025)
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
by: Xu, Yuancheng, et al.
Published: (2024)
by: Xu, Yuancheng, et al.
Published: (2024)
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
by: Deng, Haikang, et al.
Published: (2023)
by: Deng, Haikang, et al.
Published: (2023)
Demystifying Multilingual Chain-of-Thought in Process Reward Modeling
by: Wang, Weixuan, et al.
Published: (2025)
by: Wang, Weixuan, et al.
Published: (2025)
R-PRM: Reasoning-Driven Process Reward Modeling
by: She, Shuaijie, et al.
Published: (2025)
by: She, Shuaijie, et al.
Published: (2025)
AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress
by: Xi, Zhiheng, et al.
Published: (2025)
by: Xi, Zhiheng, et al.
Published: (2025)
DreamPRM-Code: Function-as-Step Process Reward Model with Label Correction for LLM Coding
by: Zhang, Ruiyi, et al.
Published: (2025)
by: Zhang, Ruiyi, et al.
Published: (2025)
RewardBench 2: Advancing Reward Model Evaluation
by: Malik, Saumya, et al.
Published: (2025)
by: Malik, Saumya, et al.
Published: (2025)
DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification
by: Liu, Rui, et al.
Published: (2026)
by: Liu, Rui, et al.
Published: (2026)
Process Reward Model with Q-Value Rankings
by: Li, Wendi, et al.
Published: (2024)
by: Li, Wendi, et al.
Published: (2024)
Process-based Self-Rewarding Language Models
by: Zhang, Shimao, et al.
Published: (2025)
by: Zhang, Shimao, et al.
Published: (2025)
Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond
by: Han, Soyeon Caren, et al.
Published: (2024)
by: Han, Soyeon Caren, et al.
Published: (2024)
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision
by: Pala, Tej Deep, et al.
Published: (2025)
by: Pala, Tej Deep, et al.
Published: (2025)
Similar Items
-
Truncated Step-Level Sampling with Process Rewards for Retrieval-Augmented Reasoning
by: Samarinas, Chris, et al.
Published: (2026) -
Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis
by: Qiu, Zhisong, et al.
Published: (2026) -
Uncertainty-Aware Step-wise Verification with Generative Reward Models
by: Ye, Zihuiwen, et al.
Published: (2025) -
Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models
by: Nie, Shuo, et al.
Published: (2026) -
The Bidirectional Process Reward Model
by: Zhang, Lingyin, et al.
Published: (2025)