Saved in:
| Main Author: | Wang, Zhijie |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.14041 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond the First Error: Process Reward Models for Reflective Mathematical Reasoning
by: Yang, Zhaohui, et al.
Published: (2025)
by: Yang, Zhaohui, et al.
Published: (2025)
GRPO is Secretly a Process Reward Model
by: Sullivan, Michael, et al.
Published: (2025)
by: Sullivan, Michael, et al.
Published: (2025)
Large Language Models and Mathematical Reasoning Failures
by: Boye, Johan, et al.
Published: (2025)
by: Boye, Johan, et al.
Published: (2025)
Coarse-to-Fine Process Reward Modeling for Mathematical Reasoning
by: Hu, Yulan, et al.
Published: (2025)
by: Hu, Yulan, et al.
Published: (2025)
Fast-Slow Thinking GRPO for Large Vision-Language Model Reasoning
by: Xiao, Wenyi, et al.
Published: (2025)
by: Xiao, Wenyi, et al.
Published: (2025)
A Survey on Large Language Models for Mathematical Reasoning
by: Wang, Peng-Yuan, et al.
Published: (2025)
by: Wang, Peng-Yuan, et al.
Published: (2025)
Mathematical Computation and Reasoning Errors by Large Language Models
by: Zhang, Liang, et al.
Published: (2025)
by: Zhang, Liang, et al.
Published: (2025)
Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning
by: Zhu, Jiachen, et al.
Published: (2025)
by: Zhu, Jiachen, et al.
Published: (2025)
Evaluating Robustness of Reward Models for Mathematical Reasoning
by: Kim, Sunghwan, et al.
Published: (2024)
by: Kim, Sunghwan, et al.
Published: (2024)
Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training
by: Xu, Yuanda, et al.
Published: (2026)
by: Xu, Yuanda, et al.
Published: (2026)
GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO
by: Dipta, Shubhashis Roy, et al.
Published: (2026)
by: Dipta, Shubhashis Roy, et al.
Published: (2026)
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
by: Dai, Yanqi, et al.
Published: (2026)
by: Dai, Yanqi, et al.
Published: (2026)
A Survey on Mathematical Reasoning and Optimization with Large Language Models
by: Forootani, Ali
Published: (2025)
by: Forootani, Ali
Published: (2025)
Reasoning Relay: Evaluating Stability and Interchangeability of Large Language Models in Mathematical Reasoning
by: Lu, Leo, et al.
Published: (2025)
by: Lu, Leo, et al.
Published: (2025)
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
by: Luo, Ruilin, et al.
Published: (2025)
by: Luo, Ruilin, et al.
Published: (2025)
M2A: Synergizing Mathematical and Agentic Reasoning in Large Language Models
by: Wang, Junjian, et al.
Published: (2026)
by: Wang, Junjian, et al.
Published: (2026)
Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO
by: Pappone, Francesco, et al.
Published: (2025)
by: Pappone, Francesco, et al.
Published: (2025)
Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning
by: Rajaee, Sara, et al.
Published: (2025)
by: Rajaee, Sara, et al.
Published: (2025)
The Lessons of Developing Process Reward Models in Mathematical Reasoning
by: Zhang, Zhenru, et al.
Published: (2025)
by: Zhang, Zhenru, et al.
Published: (2025)
AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward
by: Huang, Runhui, et al.
Published: (2026)
by: Huang, Runhui, et al.
Published: (2026)
Teaching Large Reasoning Models Effective Reflection
by: Wang, Hanbin, et al.
Published: (2026)
by: Wang, Hanbin, et al.
Published: (2026)
Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model
by: Zhu, Xunyu, et al.
Published: (2024)
by: Zhu, Xunyu, et al.
Published: (2024)
Beyond Correctness: Confidence-Aware Reward Modeling for Enhancing Large Language Model Reasoning
by: He, Qianxi, et al.
Published: (2025)
by: He, Qianxi, et al.
Published: (2025)
GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy
by: Tan, Hongze, et al.
Published: (2025)
by: Tan, Hongze, et al.
Published: (2025)
From Reasoning to Code: GRPO Optimization for Underrepresented Languages
by: Pennino, Federico, et al.
Published: (2025)
by: Pennino, Federico, et al.
Published: (2025)
ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training
by: Ai, Rui, et al.
Published: (2026)
by: Ai, Rui, et al.
Published: (2026)
Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning
by: Zhao, Jun, et al.
Published: (2024)
by: Zhao, Jun, et al.
Published: (2024)
MMR-GRPO: Accelerating GRPO-Style Training through Diversity-Aware Reward Reweighting
by: Wei, Kangda, et al.
Published: (2026)
by: Wei, Kangda, et al.
Published: (2026)
Noise-corrected GRPO: From Noisy Rewards to Unbiased Gradients
by: Mansouri, Omar El, et al.
Published: (2025)
by: Mansouri, Omar El, et al.
Published: (2025)
Efficient Paths and Dense Rewards: Probabilistic Flow Reasoning for Large Language Models
by: Liu, Yan, et al.
Published: (2026)
by: Liu, Yan, et al.
Published: (2026)
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning
by: Chen, Jinhao, et al.
Published: (2025)
by: Chen, Jinhao, et al.
Published: (2025)
What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning
by: Ma, Yiran, et al.
Published: (2024)
by: Ma, Yiran, et al.
Published: (2024)
Revisiting Anthropomorphic Reflection Markers in Large Language Model Reasoning
by: Yu, Yahan, et al.
Published: (2026)
by: Yu, Yahan, et al.
Published: (2026)
Numerical Sensitivity and Robustness: Exploring the Flaws of Mathematical Reasoning in Large Language Models
by: Sun, Zhishen, et al.
Published: (2025)
by: Sun, Zhishen, et al.
Published: (2025)
Step-GRPO: Internalizing Dynamic Early Exit for Efficient Reasoning
by: Chen, Benteng, et al.
Published: (2026)
by: Chen, Benteng, et al.
Published: (2026)
Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback
by: Zhang, Xiaoying, et al.
Published: (2025)
by: Zhang, Xiaoying, et al.
Published: (2025)
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
by: Wang, Teng, et al.
Published: (2025)
by: Wang, Teng, et al.
Published: (2025)
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
by: Yao, Huanjin, et al.
Published: (2025)
by: Yao, Huanjin, et al.
Published: (2025)
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
by: Mirzadeh, Iman, et al.
Published: (2024)
by: Mirzadeh, Iman, et al.
Published: (2024)
CAMA: Enhancing Mathematical Reasoning in Large Language Models with Causal Knowledge
by: Zan, Lei, et al.
Published: (2025)
by: Zan, Lei, et al.
Published: (2025)
Similar Items
-
Beyond the First Error: Process Reward Models for Reflective Mathematical Reasoning
by: Yang, Zhaohui, et al.
Published: (2025) -
GRPO is Secretly a Process Reward Model
by: Sullivan, Michael, et al.
Published: (2025) -
Large Language Models and Mathematical Reasoning Failures
by: Boye, Johan, et al.
Published: (2025) -
Coarse-to-Fine Process Reward Modeling for Mathematical Reasoning
by: Hu, Yulan, et al.
Published: (2025) -
Fast-Slow Thinking GRPO for Large Vision-Language Model Reasoning
by: Xiao, Wenyi, et al.
Published: (2025)