Saved in:
| Main Authors: | Jia, Mengzhao, Zhang, Zhihan, Jiang, Meng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.18892 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AutoRubric: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning
by: Jia, Mengzhao, et al.
Published: (2025)
by: Jia, Mengzhao, et al.
Published: (2025)
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
by: Zhang, Zhihan, et al.
Published: (2024)
by: Zhang, Zhihan, et al.
Published: (2024)
Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training
by: Jia, Mengzhao, et al.
Published: (2024)
by: Jia, Mengzhao, et al.
Published: (2024)
MMTutorBench: The First Multimodal Benchmark for AI Math Tutoring
by: Yang, Tengchao, et al.
Published: (2025)
by: Yang, Tengchao, et al.
Published: (2025)
MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems
by: Zhu, Zifeng, et al.
Published: (2024)
by: Zhu, Zifeng, et al.
Published: (2024)
PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning
by: Zhang, Zhihan, et al.
Published: (2023)
by: Zhang, Zhihan, et al.
Published: (2023)
Enhancing Mathematical Reasoning in LLMs by Stepwise Correction
by: Wu, Zhenyu, et al.
Published: (2024)
by: Wu, Zhenyu, et al.
Published: (2024)
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
by: Wen, Xumeng, et al.
Published: (2025)
by: Wen, Xumeng, et al.
Published: (2025)
Sandwich Reasoning: An Answer-Reasoning-Answer Approach for Low-Latency Query Correction
by: Zhang, Chen, et al.
Published: (2026)
by: Zhang, Chen, et al.
Published: (2026)
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
by: Jia, Mengzhao, et al.
Published: (2024)
by: Jia, Mengzhao, et al.
Published: (2024)
Beyond Correctness: Rewarding Faithful Reasoning in Retrieval-Augmented Generation
by: Xu, Zhichao, et al.
Published: (2025)
by: Xu, Zhichao, et al.
Published: (2025)
Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench
by: Liu, Zheyuan, et al.
Published: (2024)
by: Liu, Zheyuan, et al.
Published: (2024)
The Strongest Teacher Is Not Always the Best Teacher: Student-Centric Answer Selection
by: Hu, Zhengyu, et al.
Published: (2026)
by: Hu, Zhengyu, et al.
Published: (2026)
Large Language Models Can Self-Correct with Key Condition Verification
by: Wu, Zhenyu, et al.
Published: (2024)
by: Wu, Zhenyu, et al.
Published: (2024)
DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry
by: Cai, Zhenyang, et al.
Published: (2025)
by: Cai, Zhenyang, et al.
Published: (2025)
LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision
by: Xu, Jundong, et al.
Published: (2025)
by: Xu, Jundong, et al.
Published: (2025)
From Answers to Rationales: Self-Aligning Multimodal Reasoning with Answer-Oriented Chain-of-Thought
by: Tan, Wentao, et al.
Published: (2025)
by: Tan, Wentao, et al.
Published: (2025)
Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning
by: Rajaee, Sara, et al.
Published: (2025)
by: Rajaee, Sara, et al.
Published: (2025)
TOWER: Tree Organized Weighting for Evaluating Complex Instructions
by: Ziems, Noah, et al.
Published: (2024)
by: Ziems, Noah, et al.
Published: (2024)
DuaShepherd: Integrating Stepwise Correctness and Potential Rewards for Mathematical Reasoning
by: Wu, Yuanhao, et al.
Published: (2025)
by: Wu, Yuanhao, et al.
Published: (2025)
Knowing When Not to Answer: Evaluating Abstention in Multimodal Reasoning Systems
by: Madhusudhan, Nishanth, et al.
Published: (2026)
by: Madhusudhan, Nishanth, et al.
Published: (2026)
Reliable Reasoning Beyond Natural Language
by: Borazjanizadeh, Nasim, et al.
Published: (2024)
by: Borazjanizadeh, Nasim, et al.
Published: (2024)
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
by: Huang, Wenxuan, et al.
Published: (2025)
by: Huang, Wenxuan, et al.
Published: (2025)
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
by: Peng, Hao, et al.
Published: (2025)
by: Peng, Hao, et al.
Published: (2025)
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph
by: Ouyang, Siru, et al.
Published: (2024)
by: Ouyang, Siru, et al.
Published: (2024)
GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning
by: Zhang, Jianghangfan, et al.
Published: (2025)
by: Zhang, Jianghangfan, et al.
Published: (2025)
Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents
by: Kim, Wonjoong, et al.
Published: (2025)
by: Kim, Wonjoong, et al.
Published: (2025)
Latent Self-Consistency for Reliable Majority-Set Selection in Short- and Long-Answer Reasoning
by: Oh, Jungsuk, et al.
Published: (2025)
by: Oh, Jungsuk, et al.
Published: (2025)
HiMed: Incentivizing Hindi Reasoning in Medical LLMs
by: Jiang, Dingfeng, et al.
Published: (2026)
by: Jiang, Dingfeng, et al.
Published: (2026)
Correct Answers from Sound Reasoning: Verifiable Process Supervision for Language Models
by: Kim, Kyuyoung, et al.
Published: (2026)
by: Kim, Kyuyoung, et al.
Published: (2026)
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
by: Luo, Ruilin, et al.
Published: (2025)
by: Luo, Ruilin, et al.
Published: (2025)
Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards
by: Han, Tianyang, et al.
Published: (2026)
by: Han, Tianyang, et al.
Published: (2026)
XFinBench: Benchmarking LLMs in Complex Financial Problem Solving and Reasoning
by: Zhang, Zhihan, et al.
Published: (2025)
by: Zhang, Zhihan, et al.
Published: (2025)
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
by: Yue, Yang, et al.
Published: (2025)
by: Yue, Yang, et al.
Published: (2025)
Triviality Corrected Endogenous Reward
by: Wang, Xinda, et al.
Published: (2026)
by: Wang, Xinda, et al.
Published: (2026)
Incentivizing In-depth Reasoning over Long Contexts with Process Advantage Shaping
by: Peng, Miao, et al.
Published: (2026)
by: Peng, Miao, et al.
Published: (2026)
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
by: Wang, Weiyun, et al.
Published: (2025)
by: Wang, Weiyun, et al.
Published: (2025)
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning
by: He, Minggui, et al.
Published: (2025)
by: He, Minggui, et al.
Published: (2025)
Process Rewards with Learned Reliability
by: Li, Jinyuan, et al.
Published: (2026)
by: Li, Jinyuan, et al.
Published: (2026)
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
by: Qu, Xiaoye, et al.
Published: (2025)
by: Qu, Xiaoye, et al.
Published: (2025)
Similar Items
-
AutoRubric: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning
by: Jia, Mengzhao, et al.
Published: (2025) -
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
by: Zhang, Zhihan, et al.
Published: (2024) -
Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training
by: Jia, Mengzhao, et al.
Published: (2024) -
MMTutorBench: The First Multimodal Benchmark for AI Math Tutoring
by: Yang, Tengchao, et al.
Published: (2025) -
MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems
by: Zhu, Zifeng, et al.
Published: (2024)