Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Guo, Shiwei, Jiang, Sihang, He, Qianxi, Xiao, Yanghua, Liang, Jiaqing, Yude, Bi, He, Minggui, Tao, Shimin, Zhang, Li
Format:	Preprint
Publié:	2025
Sujets:	Computation and Language
Accès en ligne:	https://arxiv.org/abs/2512.07075
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866908697689587712
author	Guo, Shiwei Jiang, Sihang He, Qianxi Xiao, Yanghua Liang, Jiaqing Yude, Bi He, Minggui Tao, Shimin Zhang, Li
author_facet	Guo, Shiwei Jiang, Sihang He, Qianxi Xiao, Yanghua Liang, Jiaqing Yude, Bi He, Minggui Tao, Shimin Zhang, Li
contents	In recent years, large language models (LLMs) have demonstrated strong performance on multilingual tasks. Given its wide range of applications, cross-cultural understanding capability is a crucial competency. However, existing benchmarks for evaluating whether LLMs genuinely possess this capability suffer from three key limitations: a lack of contextual scenarios, insufficient cross-cultural concept mapping, and limited deep cultural reasoning capabilities. To address these gaps, we propose SAGE, a scenario-based benchmark built via cross-cultural core concept alignment and generative task design, to evaluate LLMs' cross-cultural understanding and reasoning. Grounded in cultural theory, we categorize cross-cultural capabilities into nine dimensions. Using this framework, we curated 210 core concepts and constructed 4530 test items across 15 specific real-world scenarios, organized under four broader categories of cross-cultural situations, following established item design principles. The SAGE dataset supports continuous expansion, and experiments confirm its transferability to other languages. It reveals model weaknesses across both dimensions and scenarios, exposing systematic limitations in cross-cultural reasoning. While progress has been made, LLMs are still some distance away from reaching a truly nuanced cross-cultural understanding. In compliance with the anonymity policy, we include data and code in the supplement materials. In future versions, we will make them publicly available online.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_07075
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Do Large Language Models Truly Understand Cross-cultural Differences? Guo, Shiwei Jiang, Sihang He, Qianxi Xiao, Yanghua Liang, Jiaqing Yude, Bi He, Minggui Tao, Shimin Zhang, Li Computation and Language In recent years, large language models (LLMs) have demonstrated strong performance on multilingual tasks. Given its wide range of applications, cross-cultural understanding capability is a crucial competency. However, existing benchmarks for evaluating whether LLMs genuinely possess this capability suffer from three key limitations: a lack of contextual scenarios, insufficient cross-cultural concept mapping, and limited deep cultural reasoning capabilities. To address these gaps, we propose SAGE, a scenario-based benchmark built via cross-cultural core concept alignment and generative task design, to evaluate LLMs' cross-cultural understanding and reasoning. Grounded in cultural theory, we categorize cross-cultural capabilities into nine dimensions. Using this framework, we curated 210 core concepts and constructed 4530 test items across 15 specific real-world scenarios, organized under four broader categories of cross-cultural situations, following established item design principles. The SAGE dataset supports continuous expansion, and experiments confirm its transferability to other languages. It reveals model weaknesses across both dimensions and scenarios, exposing systematic limitations in cross-cultural reasoning. While progress has been made, LLMs are still some distance away from reaching a truly nuanced cross-cultural understanding. In compliance with the anonymity policy, we include data and code in the supplement materials. In future versions, we will make them publicly available online.
title	Do Large Language Models Truly Understand Cross-cultural Differences?
topic	Computation and Language
url	https://arxiv.org/abs/2512.07075

Documents similaires