Saved in:
Bibliographic Details
Main Authors: Wang, Xiaonan, Yeo, Jinyoung, Lim, Joon-Ho, Kim, Hansaem
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2412.07251
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913604464279552
author Wang, Xiaonan
Yeo, Jinyoung
Lim, Joon-Ho
Kim, Hansaem
author_facet Wang, Xiaonan
Yeo, Jinyoung
Lim, Joon-Ho
Kim, Hansaem
contents Large language models have exhibited significant enhancements in performance across various tasks. However, the complexity of their evaluation increases as these models generate more fluent and coherent content. Current multilingual benchmarks often use translated English versions, which may incorporate Western cultural biases that do not accurately assess other languages and cultures. To address this research gap, we introduce KULTURE Bench, an evaluation framework specifically designed for Korean culture that features datasets of cultural news, idioms, and poetry. It is designed to assess language models' cultural comprehension and reasoning capabilities at the word, sentence, and paragraph levels. Using the KULTURE Bench, we assessed the capabilities of models trained with different language corpora and analyzed the results comprehensively. The results show that there is still significant room for improvement in the models' understanding of texts related to the deeper aspects of Korean culture.
format Preprint
id arxiv_https___arxiv_org_abs_2412_07251
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle KULTURE Bench: A Benchmark for Assessing Language Model in Korean Cultural Context
Wang, Xiaonan
Yeo, Jinyoung
Lim, Joon-Ho
Kim, Hansaem
Computation and Language
Large language models have exhibited significant enhancements in performance across various tasks. However, the complexity of their evaluation increases as these models generate more fluent and coherent content. Current multilingual benchmarks often use translated English versions, which may incorporate Western cultural biases that do not accurately assess other languages and cultures. To address this research gap, we introduce KULTURE Bench, an evaluation framework specifically designed for Korean culture that features datasets of cultural news, idioms, and poetry. It is designed to assess language models' cultural comprehension and reasoning capabilities at the word, sentence, and paragraph levels. Using the KULTURE Bench, we assessed the capabilities of models trained with different language corpora and analyzed the results comprehensively. The results show that there is still significant room for improvement in the models' understanding of texts related to the deeper aspects of Korean culture.
title KULTURE Bench: A Benchmark for Assessing Language Model in Korean Cultural Context
topic Computation and Language
url https://arxiv.org/abs/2412.07251