Enregistré dans:
Détails bibliographiques
Auteurs principaux: Chen, Yukun, Zhang, Xinyu, Deng, Boyi, Tang, Jialong, Wan, Yu, Huang, Fei, Zhou, Yuxi, Yang, Baosong, Li, Yiming
Format: Preprint
Publié: 2026
Sujets:
Accès en ligne:https://arxiv.org/abs/2602.17283
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866915997095559168
author Chen, Yukun
Zhang, Xinyu
Deng, Boyi
Tang, Jialong
Wan, Yu
Huang, Fei
Zhou, Yuxi
Yang, Baosong
Li, Yiming
author_facet Chen, Yukun
Zhang, Xinyu
Deng, Boyi
Tang, Jialong
Wan, Yu
Huang, Fei
Zhou, Yuxi
Yang, Baosong
Li, Yiming
contents As large language models (LLMs) are employed worldwide, existing evaluation paradigms for their multilingual capabilities primarily focus on factual task performance, neglecting the ability to judge content's deep-level values across multiple languages. To bridge this gap, we first reveal two primary challenges in constructing values judgment benchmarks, cultural diversity and disciplinary complexity, and propose a novel two-stage human-AI collaborative annotation framework to alleviate them. This framework identifies the issue scope and nature, establishes specific annotation criteria, and utilizes multiple LLMs for final review. Building upon this framework, we introduce \textbf{X-Value}, the first \textit{Cross-lingual Values Judgment Benchmark} designed to evaluate the capability of LLMs in judging deep-level values of content. X-Value comprises 4,750 Question-Answer pairs across 14 languages, covering 7 major global issue categories, and provides 12 granular annotation metadata to facilitate a rigorous evaluation of model performance. Systematic evaluations of X-Value are conducted across 17 LLMs using distinct prompting strategies. Multi-dimensional analysis of accuracy and F1-scores reveals their limitations in cross-lingual values judgment and indicates performance disparities across categories and languages. This work highlights the urgent need to improve the underlying, values-aware content judgment capability of LLMs.\footnote{Samples of X-Value are available at https://huggingface.co/datasets/Whitolf/X-Value.}
format Preprint
id arxiv_https___arxiv_org_abs_2602_17283
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Towards Cross-lingual Values Judgment: A Consensus-Pluralism Perspective
Chen, Yukun
Zhang, Xinyu
Deng, Boyi
Tang, Jialong
Wan, Yu
Huang, Fei
Zhou, Yuxi
Yang, Baosong
Li, Yiming
Computation and Language
Artificial Intelligence
As large language models (LLMs) are employed worldwide, existing evaluation paradigms for their multilingual capabilities primarily focus on factual task performance, neglecting the ability to judge content's deep-level values across multiple languages. To bridge this gap, we first reveal two primary challenges in constructing values judgment benchmarks, cultural diversity and disciplinary complexity, and propose a novel two-stage human-AI collaborative annotation framework to alleviate them. This framework identifies the issue scope and nature, establishes specific annotation criteria, and utilizes multiple LLMs for final review. Building upon this framework, we introduce \textbf{X-Value}, the first \textit{Cross-lingual Values Judgment Benchmark} designed to evaluate the capability of LLMs in judging deep-level values of content. X-Value comprises 4,750 Question-Answer pairs across 14 languages, covering 7 major global issue categories, and provides 12 granular annotation metadata to facilitate a rigorous evaluation of model performance. Systematic evaluations of X-Value are conducted across 17 LLMs using distinct prompting strategies. Multi-dimensional analysis of accuracy and F1-scores reveals their limitations in cross-lingual values judgment and indicates performance disparities across categories and languages. This work highlights the urgent need to improve the underlying, values-aware content judgment capability of LLMs.\footnote{Samples of X-Value are available at https://huggingface.co/datasets/Whitolf/X-Value.}
title Towards Cross-lingual Values Judgment: A Consensus-Pluralism Perspective
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2602.17283