Saved in:
| Main Authors: | Wan, Qian, Xu, Ziao, Wei, Luona, Shen, Xiaoxuan, Sun, Jianwen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.21418 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mitigating Overthinking in Large Reasoning Models via Manifold Steering
by: Huang, Yao, et al.
Published: (2025)
by: Huang, Yao, et al.
Published: (2025)
Beyond Error-Based Optimization: Experience-Driven Symbolic Regression with Goal-Conditioned Reinforcement Learning
by: Sun, Jianwen, et al.
Published: (2026)
by: Sun, Jianwen, et al.
Published: (2026)
Automated discovery of symbolic laws governing skill acquisition from naturally occurring data
by: Liu, Sannyuya, et al.
Published: (2024)
by: Liu, Sannyuya, et al.
Published: (2024)
Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation
by: Bin, Yi, et al.
Published: (2025)
by: Bin, Yi, et al.
Published: (2025)
DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
by: Shen, Yi, et al.
Published: (2025)
by: Shen, Yi, et al.
Published: (2025)
Think How to Think: Mitigating Overthinking with Autonomous Difficulty Cognition in Large Reasoning Models
by: Liu, Yongjiang, et al.
Published: (2025)
by: Liu, Yongjiang, et al.
Published: (2025)
ROM: Real-time Overthinking Mitigation via Streaming Detection and Intervention
by: Wang, Xinyan, et al.
Published: (2026)
by: Wang, Xinyan, et al.
Published: (2026)
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
by: Fan, Chenrui, et al.
Published: (2025)
by: Fan, Chenrui, et al.
Published: (2025)
Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective
by: Kong, Deyang, et al.
Published: (2025)
by: Kong, Deyang, et al.
Published: (2025)
Reinforcement-aware Knowledge Distillation for LLM Reasoning
by: Zhang, Zhaoyang, et al.
Published: (2026)
by: Zhang, Zhaoyang, et al.
Published: (2026)
DARE: Difficulty-Adaptive Reinforcement Learning with Co-Evolved Difficulty Estimation
by: Zhou, Yang, et al.
Published: (2026)
by: Zhou, Yang, et al.
Published: (2026)
DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data
by: Zhou, Yuhang, et al.
Published: (2025)
by: Zhou, Yuhang, et al.
Published: (2025)
What Makes Reasoning Invalid: Echo Reflection Mitigation for Large Language Models
by: He, Chen, et al.
Published: (2025)
by: He, Chen, et al.
Published: (2025)
Mitigating Hallucinations in Large Language Models via Causal Reasoning
by: Li, Yuangang, et al.
Published: (2025)
by: Li, Yuangang, et al.
Published: (2025)
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
by: Fu, Wei, et al.
Published: (2025)
by: Fu, Wei, et al.
Published: (2025)
Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training
by: Guo, Qingyan, et al.
Published: (2024)
by: Guo, Qingyan, et al.
Published: (2024)
Don't "Overthink" Passage Reranking: Is Reasoning Truly Necessary?
by: Jedidi, Nour, et al.
Published: (2025)
by: Jedidi, Nour, et al.
Published: (2025)
Conflict-Aware Fusion: Mitigating Logic Inertia in Large Language Models via Structured Cognitive Priors
by: Bao, Qiming, et al.
Published: (2025)
by: Bao, Qiming, et al.
Published: (2025)
SAMG: Offline-to-Online Reinforcement Learning via State-Action-Conditional Offline Model Guidance
by: Zhang, Liyu, et al.
Published: (2024)
by: Zhang, Liyu, et al.
Published: (2024)
Learning to Stop Overthinking at Test Time
by: Bao, Hieu Tran, et al.
Published: (2025)
by: Bao, Hieu Tran, et al.
Published: (2025)
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
by: Xi, Zhiheng, et al.
Published: (2024)
by: Xi, Zhiheng, et al.
Published: (2024)
Revisiting Entropy in Reinforcement Learning for Large Reasoning Models
by: Jin, Renren, et al.
Published: (2025)
by: Jin, Renren, et al.
Published: (2025)
A Survey of Reinforcement Learning for Large Reasoning Models
by: Zhang, Kaiyan, et al.
Published: (2025)
by: Zhang, Kaiyan, et al.
Published: (2025)
RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs
by: Fernandez, Nigel, et al.
Published: (2025)
by: Fernandez, Nigel, et al.
Published: (2025)
Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in Large Reasoning Models
by: Phan, Hoang, et al.
Published: (2025)
by: Phan, Hoang, et al.
Published: (2025)
POT: Inducing Overthinking in LLMs via Black-Box Iterative Optimization
by: Li, Xinyu, et al.
Published: (2025)
by: Li, Xinyu, et al.
Published: (2025)
Mitigating Overthinking through Reasoning Shaping
by: Song, Feifan, et al.
Published: (2025)
by: Song, Feifan, et al.
Published: (2025)
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models
by: Dai, Muzhi, et al.
Published: (2025)
by: Dai, Muzhi, et al.
Published: (2025)
Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification
by: Liu, Chengwu, et al.
Published: (2025)
by: Liu, Chengwu, et al.
Published: (2025)
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling
by: Liu, Yang, et al.
Published: (2025)
by: Liu, Yang, et al.
Published: (2025)
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
by: Wang, Yiping, et al.
Published: (2025)
by: Wang, Yiping, et al.
Published: (2025)
On Predictability of Reinforcement Learning Dynamics for Large Language Models
by: Cai, Yuchen, et al.
Published: (2025)
by: Cai, Yuchen, et al.
Published: (2025)
DecepChain: Inducing Deceptive Reasoning in Large Language Models
by: Shen, Wei, et al.
Published: (2025)
by: Shen, Wei, et al.
Published: (2025)
MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning
by: Wang, Hongjun, et al.
Published: (2026)
by: Wang, Hongjun, et al.
Published: (2026)
Optimizing Reasoning Efficiency through Prompt Difficulty Prediction
by: Zhao, Bo, et al.
Published: (2025)
by: Zhao, Bo, et al.
Published: (2025)
Layer-Aware Influence for Online Data Valuation Estimation
by: Yang, Ziao, et al.
Published: (2025)
by: Yang, Ziao, et al.
Published: (2025)
Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?
by: Qu, Yun, et al.
Published: (2025)
by: Qu, Yun, et al.
Published: (2025)
Overthinking the Truth: Understanding how Language Models Process False Demonstrations
by: Halawi, Danny, et al.
Published: (2023)
by: Halawi, Danny, et al.
Published: (2023)
Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning
by: Wu, Jiayun, et al.
Published: (2025)
by: Wu, Jiayun, et al.
Published: (2025)
Mitigating Overthinking in Large Reasoning Language Models via Reasoning Path Deviation Monitoring
by: Guan, Weixin, et al.
Published: (2026)
by: Guan, Weixin, et al.
Published: (2026)
Similar Items
-
Mitigating Overthinking in Large Reasoning Models via Manifold Steering
by: Huang, Yao, et al.
Published: (2025) -
Beyond Error-Based Optimization: Experience-Driven Symbolic Regression with Goal-Conditioned Reinforcement Learning
by: Sun, Jianwen, et al.
Published: (2026) -
Automated discovery of symbolic laws governing skill acquisition from naturally occurring data
by: Liu, Sannyuya, et al.
Published: (2024) -
Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation
by: Bin, Yi, et al.
Published: (2025) -
DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
by: Shen, Yi, et al.
Published: (2025)