Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ye, Yangfan, Feng, Xiaocheng, Tang, Jialong, Cao, Xiayu, Zhang, Zihan, Feng, Xiachong, Yang, Baosong, Qin, Bing
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2606.01879
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910280197341184
author	Ye, Yangfan Feng, Xiaocheng Tang, Jialong Cao, Xiayu Zhang, Zihan Feng, Xiachong Yang, Baosong Qin, Bing
author_facet	Ye, Yangfan Feng, Xiaocheng Tang, Jialong Cao, Xiayu Zhang, Zihan Feng, Xiachong Yang, Baosong Qin, Bing
contents	Existing research largely reduces cultural intelligence in LLMs to a knowledge-level problem, overlooking whether models can effectively utilize their acquired knowledge in realistic scenarios. To bridge this gap, we introduce CultureForest, a benchmark for \textit{Cultural Norm Grounded Reasoning}. Each question is grounded in a small set of atomic norms, enabling verifiable and attributable evaluation. CultureForest comprises 5,378 examples across 8 domains and 53 countries/regions, and supports a progressive evaluation from multiple-choice to open-ended generation. Extensive experiments reveal that even top-tier models degrade substantially in open-ended settings, accompanied by pronounced cross-region disparities. Through targeted analysis, we uncover several consistent patterns: (1) test-time reasoning yields limited gains and may exacerbate inequity; (2) models exhibit highly shared regional preference structures; (3) model responses are markedly conservative, especially under stricter cultural constraints; and (4) by disentangling cultural knowledge acquisition from cultural reasoning, we show that while LLMs possess substantial cultural knowledge, their performance is further bottlenecked by its effective use. These findings point to a necessary shift from knowledge-centric evaluation toward measuring knowledge-grounded reasoning.
format	Preprint
id	arxiv_https___arxiv_org_abs_2606_01879
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	CultureForest: Understanding and Evaluating Cultural Norm Grounded Reasoning in LLMs Ye, Yangfan Feng, Xiaocheng Tang, Jialong Cao, Xiayu Zhang, Zihan Feng, Xiachong Yang, Baosong Qin, Bing Computation and Language Existing research largely reduces cultural intelligence in LLMs to a knowledge-level problem, overlooking whether models can effectively utilize their acquired knowledge in realistic scenarios. To bridge this gap, we introduce CultureForest, a benchmark for \textit{Cultural Norm Grounded Reasoning}. Each question is grounded in a small set of atomic norms, enabling verifiable and attributable evaluation. CultureForest comprises 5,378 examples across 8 domains and 53 countries/regions, and supports a progressive evaluation from multiple-choice to open-ended generation. Extensive experiments reveal that even top-tier models degrade substantially in open-ended settings, accompanied by pronounced cross-region disparities. Through targeted analysis, we uncover several consistent patterns: (1) test-time reasoning yields limited gains and may exacerbate inequity; (2) models exhibit highly shared regional preference structures; (3) model responses are markedly conservative, especially under stricter cultural constraints; and (4) by disentangling cultural knowledge acquisition from cultural reasoning, we show that while LLMs possess substantial cultural knowledge, their performance is further bottlenecked by its effective use. These findings point to a necessary shift from knowledge-centric evaluation toward measuring knowledge-grounded reasoning.
title	CultureForest: Understanding and Evaluating Cultural Norm Grounded Reasoning in LLMs
topic	Computation and Language
url	https://arxiv.org/abs/2606.01879

Similar Items