:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liang, Qiao, Zhu, Yuke, Ge, Chao, Yang, Lei, Shen, Ying, Zheng, Bo, Guo, Sheng
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2602.09598
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty
by: Yu, Zewei, et al.
Published: (2026)

PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning
by: Wu, Feijie, et al.
Published: (2025)

AdaTIR: Adaptive Tool-Integrated Reasoning via Difficulty-Aware Policy Optimization
by: Fang, Zhaiyu, et al.
Published: (2026)

Multi-Agent Tool-Integrated Policy Optimization
by: Mo, Zhanfeng, et al.
Published: (2025)

AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning
by: Zou, Jiaru, et al.
Published: (2025)

Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning
by: Xu, Ran, et al.
Published: (2025)

Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis
by: Zhao, Yufeng, et al.
Published: (2025)

Empowering Multi-Turn Tool-Integrated Agentic Reasoning with Group Turn Policy Optimization
by: Ding, Yifeng, et al.
Published: (2025)

Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning
by: Ma, Qinghe, et al.
Published: (2026)

Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning
by: Cheng, Qianjia, et al.
Published: (2026)

Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization
by: Li, Yang, et al.
Published: (2025)

Interpreting and Controlling LLM Reasoning through Integrated Policy Gradient
by: Li, Changming, et al.
Published: (2026)

IAPO: Information-Aware Policy Optimization for Token-Efficient Reasoning
by: He, Yinhan, et al.
Published: (2026)

Temporal Consistency for LLM Reasoning Process Error Identification
by: Guo, Jiacheng, et al.
Published: (2025)

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
by: He, Yancheng, et al.
Published: (2025)

Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning
by: Deng, Jingcheng, et al.
Published: (2026)

LLM Agents Already Know When to Call Tools -- Even Without Reasoning
by: Sun, Chung-En, et al.
Published: (2026)

When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning
by: Wei, Jiaqi, et al.
Published: (2026)

Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning
by: Wang, Ziyan, et al.
Published: (2025)

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning
by: Liang, Xiao, et al.
Published: (2025)

Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning
by: Dong, Guanting, et al.
Published: (2025)

Guiding LLM-based Loop Invariant Synthesis via Feedback on Local Reasoning Errors
by: Li, Tianchi, et al.
Published: (2026)

Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use
by: Pang, Renning, et al.
Published: (2026)

Discovery and Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees
by: Li, Kun, et al.
Published: (2026)

LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning
by: Wang, Jianing, et al.
Published: (2026)

THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
by: Chang, Qikai, et al.
Published: (2025)

Not All Errors Are Created Equal: ASCoT Addresses Late-Stage Fragility in Efficient LLM Reasoning
by: Zhang, Dongxu, et al.
Published: (2025)

Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction
by: Ko, Yuka, et al.
Published: (2024)

Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
by: Zhang, Xichen, et al.
Published: (2025)

Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools
by: Wu, Junde, et al.
Published: (2025)

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
by: Gou, Zhibin, et al.
Published: (2023)

DPEPO: Diverse Parallel Exploration Policy Optimization for LLM-based Agents
by: Zhang, Junshuo, et al.
Published: (2026)

Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
by: Lu, Meng, et al.
Published: (2025)

NCV: A Node-Wise Consistency Verification Approach for Low-Cost Structured Error Localization in LLM Reasoning
by: Zhang, Yulong, et al.
Published: (2025)

Maximizing Local Entropy Where It Matters: Prefix-Aware Localized LLM Unlearning
by: Zhai, Naixin, et al.
Published: (2026)

Contextual Drag: How Errors in the Context Affect LLM Reasoning
by: Cheng, Yun, et al.
Published: (2026)

ToolDreamer: Instilling LLM Reasoning Into Tool Retrievers
by: Sengupta, Saptarshi, et al.
Published: (2025)

Teaching Audio Models to Reason: A Unified Framework for Source- and Layer-wise Distillation
by: Yang, Runyan, et al.
Published: (2025)

Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning
by: Xu, Ningning, et al.
Published: (2025)

BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
by: Xi, Zhiheng, et al.
Published: (2025)