Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Chen, Yukun, Zhang, Xinyu, Deng, Boyi, Tang, Jialong, Wan, Yu, Huang, Fei, Zhou, Yuxi, Yang, Baosong, Li, Yiming
Format:	Preprint
Publié:	2026
Sujets:	Computation and Language Artificial Intelligence
Accès en ligne:	https://arxiv.org/abs/2602.17283
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866915997095559168
author	Chen, Yukun Zhang, Xinyu Deng, Boyi Tang, Jialong Wan, Yu Huang, Fei Zhou, Yuxi Yang, Baosong Li, Yiming
author_facet	Chen, Yukun Zhang, Xinyu Deng, Boyi Tang, Jialong Wan, Yu Huang, Fei Zhou, Yuxi Yang, Baosong Li, Yiming
contents	As large language models (LLMs) are employed worldwide, existing evaluation paradigms for their multilingual capabilities primarily focus on factual task performance, neglecting the ability to judge content's deep-level values across multiple languages. To bridge this gap, we first reveal two primary challenges in constructing values judgment benchmarks, cultural diversity and disciplinary complexity, and propose a novel two-stage human-AI collaborative annotation framework to alleviate them. This framework identifies the issue scope and nature, establishes specific annotation criteria, and utilizes multiple LLMs for final review. Building upon this framework, we introduce \textbf{X-Value}, the first \textit{Cross-lingual Values Judgment Benchmark} designed to evaluate the capability of LLMs in judging deep-level values of content. X-Value comprises 4,750 Question-Answer pairs across 14 languages, covering 7 major global issue categories, and provides 12 granular annotation metadata to facilitate a rigorous evaluation of model performance. Systematic evaluations of X-Value are conducted across 17 LLMs using distinct prompting strategies. Multi-dimensional analysis of accuracy and F1-scores reveals their limitations in cross-lingual values judgment and indicates performance disparities across categories and languages. This work highlights the urgent need to improve the underlying, values-aware content judgment capability of LLMs.\footnote{Samples of X-Value are available at https://huggingface.co/datasets/Whitolf/X-Value.}
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_17283
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Towards Cross-lingual Values Judgment: A Consensus-Pluralism Perspective Chen, Yukun Zhang, Xinyu Deng, Boyi Tang, Jialong Wan, Yu Huang, Fei Zhou, Yuxi Yang, Baosong Li, Yiming Computation and Language Artificial Intelligence As large language models (LLMs) are employed worldwide, existing evaluation paradigms for their multilingual capabilities primarily focus on factual task performance, neglecting the ability to judge content's deep-level values across multiple languages. To bridge this gap, we first reveal two primary challenges in constructing values judgment benchmarks, cultural diversity and disciplinary complexity, and propose a novel two-stage human-AI collaborative annotation framework to alleviate them. This framework identifies the issue scope and nature, establishes specific annotation criteria, and utilizes multiple LLMs for final review. Building upon this framework, we introduce \textbf{X-Value}, the first \textit{Cross-lingual Values Judgment Benchmark} designed to evaluate the capability of LLMs in judging deep-level values of content. X-Value comprises 4,750 Question-Answer pairs across 14 languages, covering 7 major global issue categories, and provides 12 granular annotation metadata to facilitate a rigorous evaluation of model performance. Systematic evaluations of X-Value are conducted across 17 LLMs using distinct prompting strategies. Multi-dimensional analysis of accuracy and F1-scores reveals their limitations in cross-lingual values judgment and indicates performance disparities across categories and languages. This work highlights the urgent need to improve the underlying, values-aware content judgment capability of LLMs.\footnote{Samples of X-Value are available at https://huggingface.co/datasets/Whitolf/X-Value.}
title	Towards Cross-lingual Values Judgment: A Consensus-Pluralism Perspective
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2602.17283

Documents similaires