Saved in:
| Main Authors: | Ding, Bowen, Chen, Yuhan, Lyv, Jiayang, Yuan, Jiyao, Zhu, Qi, Tian, Shuangshuang, Zhu, Dantong, Wang, Futing, Deng, Heyuan, Mi, Fei, Shang, Lifeng, Lin, Tao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.11470 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
KDRL: Post-Training Reasoning LLMs via Unified Knowledge Distillation and Reinforcement Learning
by: Xu, Hongling, et al.
Published: (2025)
by: Xu, Hongling, et al.
Published: (2025)
Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model
by: Ding, Bowen, et al.
Published: (2025)
by: Ding, Bowen, et al.
Published: (2025)
Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning
by: Yu, Erxin, et al.
Published: (2025)
by: Yu, Erxin, et al.
Published: (2025)
ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models
by: Xue, Boyang, et al.
Published: (2025)
by: Xue, Boyang, et al.
Published: (2025)
Group Pattern Selection Optimization: Let LRMs Pick the Right Pattern for Reasoning
by: Wang, Hanbin, et al.
Published: (2026)
by: Wang, Hanbin, et al.
Published: (2026)
Teaching Large Reasoning Models Effective Reflection
by: Wang, Hanbin, et al.
Published: (2026)
by: Wang, Hanbin, et al.
Published: (2026)
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL
by: Xu, Minrui, et al.
Published: (2026)
by: Xu, Minrui, et al.
Published: (2026)
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
by: Zuo, Yuxin, et al.
Published: (2025)
by: Zuo, Yuxin, et al.
Published: (2025)
Entropy Centroids as Intrinsic Rewards for Test-Time Scaling
by: Zhao, Wenshuo, et al.
Published: (2026)
by: Zhao, Wenshuo, et al.
Published: (2026)
The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs
by: Chen, Jierun, et al.
Published: (2025)
by: Chen, Jierun, et al.
Published: (2025)
Benchmarking and Rethinking Knowledge Editing for Large Language Models
by: He, Guoxiu, et al.
Published: (2025)
by: He, Guoxiu, et al.
Published: (2025)
EssayBench: Evaluating Large Language Models in Multi-Genre Chinese Essay Writing
by: Gao, Fan, et al.
Published: (2025)
by: Gao, Fan, et al.
Published: (2025)
Stackelberg Meta-Learning for Strategic Guidance in Multi-Robot Trajectory Planning
by: Zhao, Yuhan, et al.
Published: (2022)
by: Zhao, Yuhan, et al.
Published: (2022)
On Data Synthesis and Post-training for Visual Abstract Reasoning
by: Zhu, Ke, et al.
Published: (2025)
by: Zhu, Ke, et al.
Published: (2025)
Stackelberg Game-Theoretic Trajectory Guidance for Multi-Robot Systems with Koopman Operator
by: Zhao, Yuhan, et al.
Published: (2023)
by: Zhao, Yuhan, et al.
Published: (2023)
Beyond Rejection Sampling: Trajectory Fusion for Scaling Mathematical Reasoning
by: Deng, Jie, et al.
Published: (2026)
by: Deng, Jie, et al.
Published: (2026)
How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study
by: Zhang, Zhexin, et al.
Published: (2025)
by: Zhang, Zhexin, et al.
Published: (2025)
Data Management For Training Large Language Models: A Survey
by: Wang, Zige, et al.
Published: (2023)
by: Wang, Zige, et al.
Published: (2023)
ELICIT: LLM Augmentation via External In-Context Capability
by: Wang, Futing, et al.
Published: (2024)
by: Wang, Futing, et al.
Published: (2024)
Asymptotically Optimal Depth Fermionic Permutation on 2D Grid Quantum Architecture without Ancillas
by: Li, Dantong, et al.
Published: (2026)
by: Li, Dantong, et al.
Published: (2026)
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks
by: Stephan, Andreas, et al.
Published: (2024)
by: Stephan, Andreas, et al.
Published: (2024)
Guarded Repair for Harm-Aware Post-hoc Replacement of LLM Mathematical Reasoning
by: Xia, Haizhou
Published: (2026)
by: Xia, Haizhou
Published: (2026)
Heatmap Guided Query Transformers for Robust Astrocyte Detection across Immunostains and Resolutions
by: Zhang, Xizhe, et al.
Published: (2025)
by: Zhang, Xizhe, et al.
Published: (2025)
Rethinking Cross-Domain Evaluation for Face Forgery Detection with Semantic Fine-grained Alignment and Mixture-of-Experts
by: Luo, Yuhan, et al.
Published: (2026)
by: Luo, Yuhan, et al.
Published: (2026)
SchoenbAt: Rethinking Attention with Polynomial basis
by: Guo, Yuhan, et al.
Published: (2025)
by: Guo, Yuhan, et al.
Published: (2025)
When to Reason: Semantic Router for vLLM
by: Wang, Chen, et al.
Published: (2025)
by: Wang, Chen, et al.
Published: (2025)
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
by: Deng, Yihe, et al.
Published: (2025)
by: Deng, Yihe, et al.
Published: (2025)
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
by: Li, Xinhao, et al.
Published: (2023)
by: Li, Xinhao, et al.
Published: (2023)
Rethinking Polarization in Wurtzite Semiconductors
by: Wang, Ding, et al.
Published: (2024)
by: Wang, Ding, et al.
Published: (2024)
Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning
by: Tan, Zelin, et al.
Published: (2025)
by: Tan, Zelin, et al.
Published: (2025)
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving
by: Xu, Xin, et al.
Published: (2025)
by: Xu, Xin, et al.
Published: (2025)
ViTE: Virtual Graph Trajectory Expert Router for Pedestrian Trajectory Prediction
by: Li, Ruochen, et al.
Published: (2025)
by: Li, Ruochen, et al.
Published: (2025)
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning
by: Deng, Yihe, et al.
Published: (2024)
by: Deng, Yihe, et al.
Published: (2024)
Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning
by: Wang, Yiming, et al.
Published: (2024)
by: Wang, Yiming, et al.
Published: (2024)
Rethinking Wireless Communications through Formal Mathematical AI Reasoning
by: Zhao, Changyuan, et al.
Published: (2026)
by: Zhao, Changyuan, et al.
Published: (2026)
TraXion: Rethinking Pre-training Frameworks for Mobility and Beyond
by: Hsu, Shang-Ling, et al.
Published: (2026)
by: Hsu, Shang-Ling, et al.
Published: (2026)
Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Reasoning
by: Hu, Boren, et al.
Published: (2026)
by: Hu, Boren, et al.
Published: (2026)
Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification
by: Liu, Chengwu, et al.
Published: (2025)
by: Liu, Chengwu, et al.
Published: (2025)
The Mechanism of the Return Decision‐Making of Rural Migrants in China From the Translocal Perspective: The Case of County Towns in Yangzhou
by: Jiachen Zhang, et al.
Published: (2025)
by: Jiachen Zhang, et al.
Published: (2025)
Same Verdict, Different Reasons: LLM-as-a-Judge and Clinician Disagreement on Medical Chatbot Completeness
by: DeLucia, Alexandra, et al.
Published: (2026)
by: DeLucia, Alexandra, et al.
Published: (2026)
Similar Items
-
KDRL: Post-Training Reasoning LLMs via Unified Knowledge Distillation and Reinforcement Learning
by: Xu, Hongling, et al.
Published: (2025) -
Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model
by: Ding, Bowen, et al.
Published: (2025) -
Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning
by: Yu, Erxin, et al.
Published: (2025) -
ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models
by: Xue, Boyang, et al.
Published: (2025) -
Group Pattern Selection Optimization: Let LRMs Pick the Right Pattern for Reasoning
by: Wang, Hanbin, et al.
Published: (2026)