MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Hamad, Hassan, Xu, Yingru, Zhao, Liang, Yan, Wenbo, Gyanchandani, Narendra
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2510.17052
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866912659576717312
author	Hamad, Hassan Xu, Yingru Zhao, Liang Yan, Wenbo Gyanchandani, Narendra
author_facet	Hamad, Hassan Xu, Yingru Zhao, Liang Yan, Wenbo Gyanchandani, Narendra
contents	Tool-augmented large language models (LLMs) are increasingly employed in real-world applications, but tool usage errors still hinder their reliability. We introduce ToolCritic, a diagnostic framework that evaluates and improves LLM behavior in multi-turn, tool-augmented dialogues. ToolCritic detects eight distinct error types specific to tool-calling (e.g., premature invocation, argument misalignment, and misinterpretation of tool outputs) and provides targeted feedback to the main LLM. The main LLM, assumed to have strong reasoning, task understanding and orchestration capabilities, then revises its response based on ToolCritic's feedback. We systematically define these error categories and construct a synthetic dataset to train ToolCritic. Experimental results on the Schema-Guided Dialogue (SGD) dataset demonstrate that ToolCritic improves tool-calling accuracy by up to 13% over baselines, including zero-shot prompting and self-correction techniques. This represents a promising step toward more robust LLM integration with external tools in real-world dialogue applications.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_17052
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	ToolCritic: Detecting and Correcting Tool-Use Errors in Dialogue Systems Hamad, Hassan Xu, Yingru Zhao, Liang Yan, Wenbo Gyanchandani, Narendra Artificial Intelligence Tool-augmented large language models (LLMs) are increasingly employed in real-world applications, but tool usage errors still hinder their reliability. We introduce ToolCritic, a diagnostic framework that evaluates and improves LLM behavior in multi-turn, tool-augmented dialogues. ToolCritic detects eight distinct error types specific to tool-calling (e.g., premature invocation, argument misalignment, and misinterpretation of tool outputs) and provides targeted feedback to the main LLM. The main LLM, assumed to have strong reasoning, task understanding and orchestration capabilities, then revises its response based on ToolCritic's feedback. We systematically define these error categories and construct a synthetic dataset to train ToolCritic. Experimental results on the Schema-Guided Dialogue (SGD) dataset demonstrate that ToolCritic improves tool-calling accuracy by up to 13% over baselines, including zero-shot prompting and self-correction techniques. This represents a promising step toward more robust LLM integration with external tools in real-world dialogue applications.
title	ToolCritic: Detecting and Correcting Tool-Use Errors in Dialogue Systems
topic	Artificial Intelligence
url	https://arxiv.org/abs/2510.17052

Documenti analoghi