Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Zuoyu, Zhu, Yancheng
Format:	Preprint
Published:	2026
Subjects:	Human-Computer Interaction
Online Access:	https://arxiv.org/abs/2603.05515
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918373641682944
author	Zhang, Zuoyu Zhu, Yancheng
author_facet	Zhang, Zuoyu Zhu, Yancheng
contents	Tool calling allows large language models (LLMs) to interact with external systems like APIs, enabling applications in customer support, data analysis, and dynamic content generation. While recent benchmarks have advanced tool-use research, they suffer from key limitations, including reliance on simulated or restricted APIs, limited reproducibility, and a lack of cultural and geographic diversity. To address these gaps, we introduce International Tool Calling (ITC), a large-scale, multilingual benchmark designed for realistic, globally distributed tool-calling scenarios. ITC includes 3,571 real APIs and 17,540 tool calling tasks across 20 categories and 40 countries. Experiments reveal substantial performance gaps between open- and closed-source LLMs, while fine-tuning on ITC yields significant improvements, particularly for non-English queries, enhancing cross-lingual generalization, reasoning consistency, and robustness to out-of-domain tools. ITC provides a valuable benchmark for advancing LLM robustness and performance in complex, multi-tool, and international scenarios. Dataset: https://anonymous.4open.science/r/International-Tool-Calling-ITC-dataset-FAF4/.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_05515
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Enhancing Tool Calling in LLMs with the International Tool Calling Dataset Zhang, Zuoyu Zhu, Yancheng Human-Computer Interaction Tool calling allows large language models (LLMs) to interact with external systems like APIs, enabling applications in customer support, data analysis, and dynamic content generation. While recent benchmarks have advanced tool-use research, they suffer from key limitations, including reliance on simulated or restricted APIs, limited reproducibility, and a lack of cultural and geographic diversity. To address these gaps, we introduce International Tool Calling (ITC), a large-scale, multilingual benchmark designed for realistic, globally distributed tool-calling scenarios. ITC includes 3,571 real APIs and 17,540 tool calling tasks across 20 categories and 40 countries. Experiments reveal substantial performance gaps between open- and closed-source LLMs, while fine-tuning on ITC yields significant improvements, particularly for non-English queries, enhancing cross-lingual generalization, reasoning consistency, and robustness to out-of-domain tools. ITC provides a valuable benchmark for advancing LLM robustness and performance in complex, multi-tool, and international scenarios. Dataset: https://anonymous.4open.science/r/International-Tool-Calling-ITC-dataset-FAF4/.
title	Enhancing Tool Calling in LLMs with the International Tool Calling Dataset
topic	Human-Computer Interaction
url	https://arxiv.org/abs/2603.05515

Similar Items