Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Xiaonan, Yeo, Jinyoung, Lim, Joon-Ho, Kim, Hansaem
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2412.07251
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913604464279552
author	Wang, Xiaonan Yeo, Jinyoung Lim, Joon-Ho Kim, Hansaem
author_facet	Wang, Xiaonan Yeo, Jinyoung Lim, Joon-Ho Kim, Hansaem
contents	Large language models have exhibited significant enhancements in performance across various tasks. However, the complexity of their evaluation increases as these models generate more fluent and coherent content. Current multilingual benchmarks often use translated English versions, which may incorporate Western cultural biases that do not accurately assess other languages and cultures. To address this research gap, we introduce KULTURE Bench, an evaluation framework specifically designed for Korean culture that features datasets of cultural news, idioms, and poetry. It is designed to assess language models' cultural comprehension and reasoning capabilities at the word, sentence, and paragraph levels. Using the KULTURE Bench, we assessed the capabilities of models trained with different language corpora and analyzed the results comprehensively. The results show that there is still significant room for improvement in the models' understanding of texts related to the deeper aspects of Korean culture.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_07251
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	KULTURE Bench: A Benchmark for Assessing Language Model in Korean Cultural Context Wang, Xiaonan Yeo, Jinyoung Lim, Joon-Ho Kim, Hansaem Computation and Language Large language models have exhibited significant enhancements in performance across various tasks. However, the complexity of their evaluation increases as these models generate more fluent and coherent content. Current multilingual benchmarks often use translated English versions, which may incorporate Western cultural biases that do not accurately assess other languages and cultures. To address this research gap, we introduce KULTURE Bench, an evaluation framework specifically designed for Korean culture that features datasets of cultural news, idioms, and poetry. It is designed to assess language models' cultural comprehension and reasoning capabilities at the word, sentence, and paragraph levels. Using the KULTURE Bench, we assessed the capabilities of models trained with different language corpora and analyzed the results comprehensively. The results show that there is still significant room for improvement in the models' understanding of texts related to the deeper aspects of Korean culture.
title	KULTURE Bench: A Benchmark for Assessing Language Model in Korean Cultural Context
topic	Computation and Language
url	https://arxiv.org/abs/2412.07251

Similar Items