Enregistré dans:
Détails bibliographiques
Auteurs principaux: Zhang, Kaituo, Xiong, Zhen, Zhong, Mingyu, Jiang, Zhimeng, Yuan, Zhouyuan, Li, Zhecheng, Lin, Ying
Format: Preprint
Publié: 2026
Sujets:
Accès en ligne:https://arxiv.org/abs/2605.00136
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866918477026033664
author Zhang, Kaituo
Xiong, Zhen
Zhong, Mingyu
Jiang, Zhimeng
Yuan, Zhouyuan
Li, Zhecheng
Lin, Ying
author_facet Zhang, Kaituo
Xiong, Zhen
Zhong, Mingyu
Jiang, Zhimeng
Yuan, Zhouyuan
Li, Zhecheng
Lin, Ying
contents Tool-augmented reasoning has become a popular direction for LLM-based agents, and it is widely assumed to improve reasoning and reliability. However, we demonstrate that this consensus does not always hold: in the presence of semantic distractors, tool-augmented reasoning does not necessarily outperform native CoT. To explain this performance gap, we propose a Factorized Intervention Framework that isolates the cost of prompt formatting, the overhead of the tool-calling protocol, and the actual gain from executing tools. Our analysis reveals a critical tradeoff: under semantic noise, the gains from tools often fail to offset the "tool-use tax", which is the performance degradation introduced by the tool-calling protocol itself. To address this, we introduce G-STEP, a lightweight inference-time gate to mitigate protocol-induced errors. While this yields partial recovery, our findings suggest that more substantial improvements still require strengthening the model's intrinsic reasoning and tool-interaction capabilities.
format Preprint
id arxiv_https___arxiv_org_abs_2605_00136
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents
Zhang, Kaituo
Xiong, Zhen
Zhong, Mingyu
Jiang, Zhimeng
Yuan, Zhouyuan
Li, Zhecheng
Lin, Ying
Artificial Intelligence
Tool-augmented reasoning has become a popular direction for LLM-based agents, and it is widely assumed to improve reasoning and reliability. However, we demonstrate that this consensus does not always hold: in the presence of semantic distractors, tool-augmented reasoning does not necessarily outperform native CoT. To explain this performance gap, we propose a Factorized Intervention Framework that isolates the cost of prompt formatting, the overhead of the tool-calling protocol, and the actual gain from executing tools. Our analysis reveals a critical tradeoff: under semantic noise, the gains from tools often fail to offset the "tool-use tax", which is the performance degradation introduced by the tool-calling protocol itself. To address this, we introduce G-STEP, a lightweight inference-time gate to mitigate protocol-induced errors. While this yields partial recovery, our findings suggest that more substantial improvements still require strengthening the model's intrinsic reasoning and tool-interaction capabilities.
title Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents
topic Artificial Intelligence
url https://arxiv.org/abs/2605.00136