Saved in:
Bibliographic Details
Main Authors: Guo, Ruocheng, Dong, Kaiwen, Gao, Xiang, Das, Kamalika
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.20426
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910175392169984
author Guo, Ruocheng
Dong, Kaiwen
Gao, Xiang
Das, Kamalika
author_facet Guo, Ruocheng
Dong, Kaiwen
Gao, Xiang
Das, Kamalika
contents While most efforts to improve LLM-based tool-using agents focus on the agent itself - through larger models, better prompting, or fine-tuning - agent performance increasingly plateaus due to the quality of the tool interfaces these agents consume. Tool descriptions are often written for human developers and tolerate ambiguity that agents cannot resolve, particularly as the number of candidate tools grows. Existing approaches to improving tool interfaces (1) require re-running a multi-stage per-tool pipeline - synthesizing queries, executing an agent to collect trajectories, annotating trajectories, and prompting a strong LLM multiple times - for every API that enters the catalog, and (2) typically optimize each tool independently, limiting scalability and generalization to unseen tools. We propose Trace-Free+, a curriculum learning framework that progressively transfers supervision from trace-rich settings to trace-free deployment, encouraging the model to internalize reusable patterns of what makes a tool description effective. To support this approach, we construct a large-scale dataset of high-quality tool interfaces derived from real-world APIs through a principled data synthesis workflow. Experiments on widely adopted benchmarks show that Trace-Free+ improves robustness as tool catalogs scale to 150+ candidates - in scaling experiments, reducing accuracy degradation by 29.23% and improving average query-level success by 60.89% on StableToolBench - generalizes across domains without retraining, and provides complementary gains on top of agent fine-tuning.
format Preprint
id arxiv_https___arxiv_org_abs_2602_20426
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Learning to Rewrite Tool Descriptions for Reliable LLM-Agent Tool Use
Guo, Ruocheng
Dong, Kaiwen
Gao, Xiang
Das, Kamalika
Artificial Intelligence
While most efforts to improve LLM-based tool-using agents focus on the agent itself - through larger models, better prompting, or fine-tuning - agent performance increasingly plateaus due to the quality of the tool interfaces these agents consume. Tool descriptions are often written for human developers and tolerate ambiguity that agents cannot resolve, particularly as the number of candidate tools grows. Existing approaches to improving tool interfaces (1) require re-running a multi-stage per-tool pipeline - synthesizing queries, executing an agent to collect trajectories, annotating trajectories, and prompting a strong LLM multiple times - for every API that enters the catalog, and (2) typically optimize each tool independently, limiting scalability and generalization to unseen tools. We propose Trace-Free+, a curriculum learning framework that progressively transfers supervision from trace-rich settings to trace-free deployment, encouraging the model to internalize reusable patterns of what makes a tool description effective. To support this approach, we construct a large-scale dataset of high-quality tool interfaces derived from real-world APIs through a principled data synthesis workflow. Experiments on widely adopted benchmarks show that Trace-Free+ improves robustness as tool catalogs scale to 150+ candidates - in scaling experiments, reducing accuracy degradation by 29.23% and improving average query-level success by 60.89% on StableToolBench - generalizes across domains without retraining, and provides complementary gains on top of agent fine-tuning.
title Learning to Rewrite Tool Descriptions for Reliable LLM-Agent Tool Use
topic Artificial Intelligence
url https://arxiv.org/abs/2602.20426