Saved in:
| Main Authors: | Liang, Qiao, Zhu, Yuke, Ge, Chao, Yang, Lei, Shen, Ying, Zheng, Bo, Guo, Sheng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.09598 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty
by: Yu, Zewei, et al.
Published: (2026)
by: Yu, Zewei, et al.
Published: (2026)
PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning
by: Wu, Feijie, et al.
Published: (2025)
by: Wu, Feijie, et al.
Published: (2025)
AdaTIR: Adaptive Tool-Integrated Reasoning via Difficulty-Aware Policy Optimization
by: Fang, Zhaiyu, et al.
Published: (2026)
by: Fang, Zhaiyu, et al.
Published: (2026)
Multi-Agent Tool-Integrated Policy Optimization
by: Mo, Zhanfeng, et al.
Published: (2025)
by: Mo, Zhanfeng, et al.
Published: (2025)
AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning
by: Zou, Jiaru, et al.
Published: (2025)
by: Zou, Jiaru, et al.
Published: (2025)
Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning
by: Xu, Ran, et al.
Published: (2025)
by: Xu, Ran, et al.
Published: (2025)
Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis
by: Zhao, Yufeng, et al.
Published: (2025)
by: Zhao, Yufeng, et al.
Published: (2025)
Empowering Multi-Turn Tool-Integrated Agentic Reasoning with Group Turn Policy Optimization
by: Ding, Yifeng, et al.
Published: (2025)
by: Ding, Yifeng, et al.
Published: (2025)
Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning
by: Ma, Qinghe, et al.
Published: (2026)
by: Ma, Qinghe, et al.
Published: (2026)
Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning
by: Cheng, Qianjia, et al.
Published: (2026)
by: Cheng, Qianjia, et al.
Published: (2026)
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization
by: Li, Yang, et al.
Published: (2025)
by: Li, Yang, et al.
Published: (2025)
Interpreting and Controlling LLM Reasoning through Integrated Policy Gradient
by: Li, Changming, et al.
Published: (2026)
by: Li, Changming, et al.
Published: (2026)
IAPO: Information-Aware Policy Optimization for Token-Efficient Reasoning
by: He, Yinhan, et al.
Published: (2026)
by: He, Yinhan, et al.
Published: (2026)
Temporal Consistency for LLM Reasoning Process Error Identification
by: Guo, Jiacheng, et al.
Published: (2025)
by: Guo, Jiacheng, et al.
Published: (2025)
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
by: He, Yancheng, et al.
Published: (2025)
by: He, Yancheng, et al.
Published: (2025)
Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning
by: Deng, Jingcheng, et al.
Published: (2026)
by: Deng, Jingcheng, et al.
Published: (2026)
LLM Agents Already Know When to Call Tools -- Even Without Reasoning
by: Sun, Chung-En, et al.
Published: (2026)
by: Sun, Chung-En, et al.
Published: (2026)
When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning
by: Wei, Jiaqi, et al.
Published: (2026)
by: Wei, Jiaqi, et al.
Published: (2026)
Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning
by: Wang, Ziyan, et al.
Published: (2025)
by: Wang, Ziyan, et al.
Published: (2025)
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning
by: Liang, Xiao, et al.
Published: (2025)
by: Liang, Xiao, et al.
Published: (2025)
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning
by: Dong, Guanting, et al.
Published: (2025)
by: Dong, Guanting, et al.
Published: (2025)
Guiding LLM-based Loop Invariant Synthesis via Feedback on Local Reasoning Errors
by: Li, Tianchi, et al.
Published: (2026)
by: Li, Tianchi, et al.
Published: (2026)
Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use
by: Pang, Renning, et al.
Published: (2026)
by: Pang, Renning, et al.
Published: (2026)
Discovery and Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees
by: Li, Kun, et al.
Published: (2026)
by: Li, Kun, et al.
Published: (2026)
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning
by: Wang, Jianing, et al.
Published: (2026)
by: Wang, Jianing, et al.
Published: (2026)
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
by: Chang, Qikai, et al.
Published: (2025)
by: Chang, Qikai, et al.
Published: (2025)
Not All Errors Are Created Equal: ASCoT Addresses Late-Stage Fragility in Efficient LLM Reasoning
by: Zhang, Dongxu, et al.
Published: (2025)
by: Zhang, Dongxu, et al.
Published: (2025)
Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction
by: Ko, Yuka, et al.
Published: (2024)
by: Ko, Yuka, et al.
Published: (2024)
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
by: Zhang, Xichen, et al.
Published: (2025)
by: Zhang, Xichen, et al.
Published: (2025)
Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools
by: Wu, Junde, et al.
Published: (2025)
by: Wu, Junde, et al.
Published: (2025)
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
by: Gou, Zhibin, et al.
Published: (2023)
by: Gou, Zhibin, et al.
Published: (2023)
DPEPO: Diverse Parallel Exploration Policy Optimization for LLM-based Agents
by: Zhang, Junshuo, et al.
Published: (2026)
by: Zhang, Junshuo, et al.
Published: (2026)
Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
by: Lu, Meng, et al.
Published: (2025)
by: Lu, Meng, et al.
Published: (2025)
NCV: A Node-Wise Consistency Verification Approach for Low-Cost Structured Error Localization in LLM Reasoning
by: Zhang, Yulong, et al.
Published: (2025)
by: Zhang, Yulong, et al.
Published: (2025)
Maximizing Local Entropy Where It Matters: Prefix-Aware Localized LLM Unlearning
by: Zhai, Naixin, et al.
Published: (2026)
by: Zhai, Naixin, et al.
Published: (2026)
Contextual Drag: How Errors in the Context Affect LLM Reasoning
by: Cheng, Yun, et al.
Published: (2026)
by: Cheng, Yun, et al.
Published: (2026)
ToolDreamer: Instilling LLM Reasoning Into Tool Retrievers
by: Sengupta, Saptarshi, et al.
Published: (2025)
by: Sengupta, Saptarshi, et al.
Published: (2025)
Teaching Audio Models to Reason: A Unified Framework for Source- and Layer-wise Distillation
by: Yang, Runyan, et al.
Published: (2025)
by: Yang, Runyan, et al.
Published: (2025)
Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning
by: Xu, Ningning, et al.
Published: (2025)
by: Xu, Ningning, et al.
Published: (2025)
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
by: Xi, Zhiheng, et al.
Published: (2025)
by: Xi, Zhiheng, et al.
Published: (2025)
Similar Items
-
Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty
by: Yu, Zewei, et al.
Published: (2026) -
PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning
by: Wu, Feijie, et al.
Published: (2025) -
AdaTIR: Adaptive Tool-Integrated Reasoning via Difficulty-Aware Policy Optimization
by: Fang, Zhaiyu, et al.
Published: (2026) -
Multi-Agent Tool-Integrated Policy Optimization
by: Mo, Zhanfeng, et al.
Published: (2025) -
AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning
by: Zou, Jiaru, et al.
Published: (2025)