Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yu, Seunguk, Kim, Kyeonghyun, Yun, Jungmin, Kim, Youngbin
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2507.03378
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908434145738752
author	Yu, Seunguk Kim, Kyeonghyun Yun, Jungmin Kim, Youngbin
author_facet	Yu, Seunguk Kim, Kyeonghyun Yun, Jungmin Kim, Youngbin
contents	Although LLMs have made significant progress in various languages, there are still concerns about their effectiveness with low-resource agglutinative languages compared to languages such as English. In this study, we focused on Korean, a language known for its complex sentence endings, and evaluated LLMs on this challenging aspect. We introduce the Korean Sentence Endings (KoSEnd) dataset, which includes 3,000 sentences, each annotated for the naturalness of 15 sentence ending forms. These were collected from diverse sources to cover a range of contexts. We evaluated 11 LLMs to assess their understanding of Korean sentence endings, analyzing them based on parameter count and prediction consistency. Notably, we found that informing models about the possibility of missing sentence endings improved performance, highlighting the impact of explicitly considering certain linguistic features.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_03378
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Making Sense of Korean Sentences: A Comprehensive Evaluation of LLMs through KoSEnd Dataset Yu, Seunguk Kim, Kyeonghyun Yun, Jungmin Kim, Youngbin Computation and Language Although LLMs have made significant progress in various languages, there are still concerns about their effectiveness with low-resource agglutinative languages compared to languages such as English. In this study, we focused on Korean, a language known for its complex sentence endings, and evaluated LLMs on this challenging aspect. We introduce the Korean Sentence Endings (KoSEnd) dataset, which includes 3,000 sentences, each annotated for the naturalness of 15 sentence ending forms. These were collected from diverse sources to cover a range of contexts. We evaluated 11 LLMs to assess their understanding of Korean sentence endings, analyzing them based on parameter count and prediction consistency. Notably, we found that informing models about the possibility of missing sentence endings improved performance, highlighting the impact of explicitly considering certain linguistic features.
title	Making Sense of Korean Sentences: A Comprehensive Evaluation of LLMs through KoSEnd Dataset
topic	Computation and Language
url	https://arxiv.org/abs/2507.03378

Similar Items