Guardado en:
Detalles Bibliográficos
Autores principales: Liu, Feiyan, Zhao, Siyan, Zhuo, Chenxun, Liu, Tianming, Ge, Bao
Formato: Preprint
Publicado: 2026
Materias:
Acceso en línea:https://arxiv.org/abs/2601.02830
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866918274962292736
author Liu, Feiyan
Zhao, Siyan
Zhuo, Chenxun
Liu, Tianming
Ge, Bao
author_facet Liu, Feiyan
Zhao, Siyan
Zhuo, Chenxun
Liu, Tianming
Ge, Bao
contents Cultural backgrounds shape individuals' perspectives and approaches to problem-solving. Since the emergence of GPT-1 in 2018, large language models (LLMs) have undergone rapid development. To date, the world's ten leading LLM developers are primarily based in China and the United States. To examine whether LLMs released by Chinese and U.S. developers exhibit cultural differences in Chinese-language settings, we evaluate their performance on questions about Chinese culture. This study adopts a direct-questioning paradigm to evaluate models such as GPT-5.1, DeepSeek-V3.2, Qwen3-Max, and Gemini2.5Pro. We assess their understanding of traditional Chinese culture, including history, literature, poetry, and related domains. Comparative analyses between LLMs developed in China and the U.S. indicate that Chinese models generally outperform their U.S. counterparts on these tasks. Among U.S.-developed models, Gemini 2.5Pro and GPT-5.1 achieve relatively higher accuracy. The observed performance differences may potentially arise from variations in training data distribution, localization strategies, and the degree of emphasis on Chinese cultural content during model development.
format Preprint
id arxiv_https___arxiv_org_abs_2601_02830
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle The performances of the Chinese and U.S. Large Language Models on the Topic of Chinese Culture
Liu, Feiyan
Zhao, Siyan
Zhuo, Chenxun
Liu, Tianming
Ge, Bao
Computation and Language
Cultural backgrounds shape individuals' perspectives and approaches to problem-solving. Since the emergence of GPT-1 in 2018, large language models (LLMs) have undergone rapid development. To date, the world's ten leading LLM developers are primarily based in China and the United States. To examine whether LLMs released by Chinese and U.S. developers exhibit cultural differences in Chinese-language settings, we evaluate their performance on questions about Chinese culture. This study adopts a direct-questioning paradigm to evaluate models such as GPT-5.1, DeepSeek-V3.2, Qwen3-Max, and Gemini2.5Pro. We assess their understanding of traditional Chinese culture, including history, literature, poetry, and related domains. Comparative analyses between LLMs developed in China and the U.S. indicate that Chinese models generally outperform their U.S. counterparts on these tasks. Among U.S.-developed models, Gemini 2.5Pro and GPT-5.1 achieve relatively higher accuracy. The observed performance differences may potentially arise from variations in training data distribution, localization strategies, and the degree of emphasis on Chinese cultural content during model development.
title The performances of the Chinese and U.S. Large Language Models on the Topic of Chinese Culture
topic Computation and Language
url https://arxiv.org/abs/2601.02830