Saved in:
Bibliographic Details
Main Authors: Zhang, Zuoyu, Zhu, Yancheng
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.05515
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918373641682944
author Zhang, Zuoyu
Zhu, Yancheng
author_facet Zhang, Zuoyu
Zhu, Yancheng
contents Tool calling allows large language models (LLMs) to interact with external systems like APIs, enabling applications in customer support, data analysis, and dynamic content generation. While recent benchmarks have advanced tool-use research, they suffer from key limitations, including reliance on simulated or restricted APIs, limited reproducibility, and a lack of cultural and geographic diversity. To address these gaps, we introduce International Tool Calling (ITC), a large-scale, multilingual benchmark designed for realistic, globally distributed tool-calling scenarios. ITC includes 3,571 real APIs and 17,540 tool calling tasks across 20 categories and 40 countries. Experiments reveal substantial performance gaps between open- and closed-source LLMs, while fine-tuning on ITC yields significant improvements, particularly for non-English queries, enhancing cross-lingual generalization, reasoning consistency, and robustness to out-of-domain tools. ITC provides a valuable benchmark for advancing LLM robustness and performance in complex, multi-tool, and international scenarios. Dataset: https://anonymous.4open.science/r/International-Tool-Calling-ITC-dataset-FAF4/.
format Preprint
id arxiv_https___arxiv_org_abs_2603_05515
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Enhancing Tool Calling in LLMs with the International Tool Calling Dataset
Zhang, Zuoyu
Zhu, Yancheng
Human-Computer Interaction
Tool calling allows large language models (LLMs) to interact with external systems like APIs, enabling applications in customer support, data analysis, and dynamic content generation. While recent benchmarks have advanced tool-use research, they suffer from key limitations, including reliance on simulated or restricted APIs, limited reproducibility, and a lack of cultural and geographic diversity. To address these gaps, we introduce International Tool Calling (ITC), a large-scale, multilingual benchmark designed for realistic, globally distributed tool-calling scenarios. ITC includes 3,571 real APIs and 17,540 tool calling tasks across 20 categories and 40 countries. Experiments reveal substantial performance gaps between open- and closed-source LLMs, while fine-tuning on ITC yields significant improvements, particularly for non-English queries, enhancing cross-lingual generalization, reasoning consistency, and robustness to out-of-domain tools. ITC provides a valuable benchmark for advancing LLM robustness and performance in complex, multi-tool, and international scenarios. Dataset: https://anonymous.4open.science/r/International-Tool-Calling-ITC-dataset-FAF4/.
title Enhancing Tool Calling in LLMs with the International Tool Calling Dataset
topic Human-Computer Interaction
url https://arxiv.org/abs/2603.05515