Staff View: :: Library Catalog

$Cover Image$

Saved in:

Bibliographic Details
Main Authors:	Xu, Ruotao, Ji, Yixin, Luo, Yu, Li, Jinpeng, Li, Dong, Li, Peifeng, Li, Juntao, Zhang, Min
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2604.08281
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914481695621120
author	Xu, Ruotao Ji, Yixin Luo, Yu Li, Jinpeng Li, Dong Li, Peifeng Li, Juntao Zhang, Min
author_facet	Xu, Ruotao Ji, Yixin Luo, Yu Li, Jinpeng Li, Dong Li, Peifeng Li, Juntao Zhang, Min
contents	Large reasoning models (LRMs) have achieved strong performance enhancement through scaling test time computation, but due to the inherent limitations of the underlying language models, they still have shortcomings in tasks that require precise computation and extensive knowledge reserves. Tool-Integrated Reasoning (TIR) has emerged as a promising paradigm that incorporates tool call and execution within the reasoning trajectory. Although recent works have released some powerful open-source TIR models, our analysis reveals that these models still suffer from critical deficiencies. We find that when the reasoning of the model conflicts with the tool results, the model tends to believe in its own reasoning. And there are cases where the tool results are correct but are ignored by the model, resulting in incorrect answers, which we define as "Tool Ignored''. This indicates that the model does not know when to trust or ignore the tool. To overcome these limitations, We introduce Adaptive Tool Trust Calibration (ATTC), a novel framework that guides the model to adaptively choose to trust or ignore the tool results based on the confidence score of generated code blocks. The experimental results from various open-source TIR models of different sizes and across multiple datasets demonstrate that ATTC effectively reduces the "Tool Ignored" issue, resulting in a performance increase of 4.1% to 7.5%.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_08281
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	When to Trust Tools? Adaptive Tool Trust Calibration For Tool-Integrated Math Reasoning Xu, Ruotao Ji, Yixin Luo, Yu Li, Jinpeng Li, Dong Li, Peifeng Li, Juntao Zhang, Min Computation and Language Large reasoning models (LRMs) have achieved strong performance enhancement through scaling test time computation, but due to the inherent limitations of the underlying language models, they still have shortcomings in tasks that require precise computation and extensive knowledge reserves. Tool-Integrated Reasoning (TIR) has emerged as a promising paradigm that incorporates tool call and execution within the reasoning trajectory. Although recent works have released some powerful open-source TIR models, our analysis reveals that these models still suffer from critical deficiencies. We find that when the reasoning of the model conflicts with the tool results, the model tends to believe in its own reasoning. And there are cases where the tool results are correct but are ignored by the model, resulting in incorrect answers, which we define as "Tool Ignored''. This indicates that the model does not know when to trust or ignore the tool. To overcome these limitations, We introduce Adaptive Tool Trust Calibration (ATTC), a novel framework that guides the model to adaptively choose to trust or ignore the tool results based on the confidence score of generated code blocks. The experimental results from various open-source TIR models of different sizes and across multiple datasets demonstrate that ATTC effectively reduces the "Tool Ignored" issue, resulting in a performance increase of 4.1% to 7.5%.
title	When to Trust Tools? Adaptive Tool Trust Calibration For Tool-Integrated Math Reasoning
topic	Computation and Language
url	https://arxiv.org/abs/2604.08281