Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wu, Zongze, Guo, Yani, Liang, Churong, Li, Runnan
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Software Engineering H.3.3; I.2.8
Online Access:	https://arxiv.org/abs/2510.17843
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918164543045632
author	Wu, Zongze Guo, Yani Liang, Churong Li, Runnan
author_facet	Wu, Zongze Guo, Yani Liang, Churong Li, Runnan
contents	Despite remarkable advances in Large Language Model capabilities, tool retrieval for agent-based systems remains fundamentally limited by reliance on semantic similarity, which fails to capture functional viability. Current methods often retrieve textually relevant but functionally inoperative tools due to parameter mismatches, authentication failures, and execution constraints--a phenomenon we term the semantic-functional gap. We introduce GRETEL, to address this gap through systematic empirical validation. GRETEL implements an agentic workflow that processes semantically retrieved candidates through sandboxed plan-execute-evaluate cycles, generating execution-grounded evidence to distinguish truly functional tools from merely descriptive matches. Our comprehensive evaluation on the ToolBench benchmark demonstrates substantial improvements across all metrics: Pass Rate (at 10) increases from 0.690 to 0.826, Recall (at 10) improves from 0.841 to 0.867, and NDCG (at 10) rises from 0.807 to 0.857.. These results establish that execution-based validation provides a more reliable foundation for tool selection than semantic similarity alone, enabling more robust agent performance in real-world applications.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_17843
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	GRETEL: A Goal-driven Retrieval and Execution-based Trial Framework for LLM Tool Selection Enhancing Wu, Zongze Guo, Yani Liang, Churong Li, Runnan Machine Learning Artificial Intelligence Software Engineering H.3.3; I.2.8 Despite remarkable advances in Large Language Model capabilities, tool retrieval for agent-based systems remains fundamentally limited by reliance on semantic similarity, which fails to capture functional viability. Current methods often retrieve textually relevant but functionally inoperative tools due to parameter mismatches, authentication failures, and execution constraints--a phenomenon we term the semantic-functional gap. We introduce GRETEL, to address this gap through systematic empirical validation. GRETEL implements an agentic workflow that processes semantically retrieved candidates through sandboxed plan-execute-evaluate cycles, generating execution-grounded evidence to distinguish truly functional tools from merely descriptive matches. Our comprehensive evaluation on the ToolBench benchmark demonstrates substantial improvements across all metrics: Pass Rate (at 10) increases from 0.690 to 0.826, Recall (at 10) improves from 0.841 to 0.867, and NDCG (at 10) rises from 0.807 to 0.857.. These results establish that execution-based validation provides a more reliable foundation for tool selection than semantic similarity alone, enabling more robust agent performance in real-world applications.
title	GRETEL: A Goal-driven Retrieval and Execution-based Trial Framework for LLM Tool Selection Enhancing
topic	Machine Learning Artificial Intelligence Software Engineering H.3.3; I.2.8
url	https://arxiv.org/abs/2510.17843

Similar Items