Enregistré dans:
| Auteurs principaux: | , , , , , , |
|---|---|
| Format: | Preprint |
| Publié: |
2026
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2605.00136 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866918477026033664 |
|---|---|
| author | Zhang, Kaituo Xiong, Zhen Zhong, Mingyu Jiang, Zhimeng Yuan, Zhouyuan Li, Zhecheng Lin, Ying |
| author_facet | Zhang, Kaituo Xiong, Zhen Zhong, Mingyu Jiang, Zhimeng Yuan, Zhouyuan Li, Zhecheng Lin, Ying |
| contents | Tool-augmented reasoning has become a popular direction for LLM-based agents, and it is widely assumed to improve reasoning and reliability. However, we demonstrate that this consensus does not always hold: in the presence of semantic distractors, tool-augmented reasoning does not necessarily outperform native CoT. To explain this performance gap, we propose a Factorized Intervention Framework that isolates the cost of prompt formatting, the overhead of the tool-calling protocol, and the actual gain from executing tools. Our analysis reveals a critical tradeoff: under semantic noise, the gains from tools often fail to offset the "tool-use tax", which is the performance degradation introduced by the tool-calling protocol itself. To address this, we introduce G-STEP, a lightweight inference-time gate to mitigate protocol-induced errors. While this yields partial recovery, our findings suggest that more substantial improvements still require strengthening the model's intrinsic reasoning and tool-interaction capabilities. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2605_00136 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents Zhang, Kaituo Xiong, Zhen Zhong, Mingyu Jiang, Zhimeng Yuan, Zhouyuan Li, Zhecheng Lin, Ying Artificial Intelligence Tool-augmented reasoning has become a popular direction for LLM-based agents, and it is widely assumed to improve reasoning and reliability. However, we demonstrate that this consensus does not always hold: in the presence of semantic distractors, tool-augmented reasoning does not necessarily outperform native CoT. To explain this performance gap, we propose a Factorized Intervention Framework that isolates the cost of prompt formatting, the overhead of the tool-calling protocol, and the actual gain from executing tools. Our analysis reveals a critical tradeoff: under semantic noise, the gains from tools often fail to offset the "tool-use tax", which is the performance degradation introduced by the tool-calling protocol itself. To address this, we introduce G-STEP, a lightweight inference-time gate to mitigate protocol-induced errors. While this yields partial recovery, our findings suggest that more substantial improvements still require strengthening the model's intrinsic reasoning and tool-interaction capabilities. |
| title | Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents |
| topic | Artificial Intelligence |
| url | https://arxiv.org/abs/2605.00136 |