Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yoo, JaeGeon, Kim, Byoungwook, Yang, Yeongwook, Jang, Hong-Jun
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2603.03652
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912942026391552
author	Yoo, JaeGeon Kim, Byoungwook Yang, Yeongwook Jang, Hong-Jun
author_facet	Yoo, JaeGeon Kim, Byoungwook Yang, Yeongwook Jang, Hong-Jun
contents	Short text classification (STC) remains a challenging task due to the scarcity of contextual information and labeled data. However, existing approaches have pre-dominantly focused on English because most benchmark datasets for the STC are primarily available in English. Consequently, existing methods seldom incorporate the linguistic and structural characteristics of Korean, such as its agglutinative morphology and flexible word order. To address these limitations, we propose LIGRAM, a hierarchical heterogeneous graph model for Korean short-text classification. The proposed model constructs sub-graphs at the morpheme, part-of-speech, and named-entity levels and hierarchically integrates them to compensate for the limited contextual information in short texts while precisely capturing the grammatical and semantic dependencies inherent in Korean. In addition, we apply Semantics-aware Contrastive Learning (SemCon) to reflect semantic similarity across documents, enabling the model to establish clearer decision boundaries even in short texts where class distinctions are often ambiguous. We evaluate LIGRAM on four Korean short-text datasets, where it consistently outperforms existing baseline models. These outcomes validate that integrating language-specific graph representations with SemCon provides an effective solution for short text classification in agglutinative languages such as Korean.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_03652
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Linguistically Informed Graph Model and Semantic Contrastive Learning for Korean Short Text Classification Yoo, JaeGeon Kim, Byoungwook Yang, Yeongwook Jang, Hong-Jun Computation and Language Short text classification (STC) remains a challenging task due to the scarcity of contextual information and labeled data. However, existing approaches have pre-dominantly focused on English because most benchmark datasets for the STC are primarily available in English. Consequently, existing methods seldom incorporate the linguistic and structural characteristics of Korean, such as its agglutinative morphology and flexible word order. To address these limitations, we propose LIGRAM, a hierarchical heterogeneous graph model for Korean short-text classification. The proposed model constructs sub-graphs at the morpheme, part-of-speech, and named-entity levels and hierarchically integrates them to compensate for the limited contextual information in short texts while precisely capturing the grammatical and semantic dependencies inherent in Korean. In addition, we apply Semantics-aware Contrastive Learning (SemCon) to reflect semantic similarity across documents, enabling the model to establish clearer decision boundaries even in short texts where class distinctions are often ambiguous. We evaluate LIGRAM on four Korean short-text datasets, where it consistently outperforms existing baseline models. These outcomes validate that integrating language-specific graph representations with SemCon provides an effective solution for short text classification in agglutinative languages such as Korean.
title	Linguistically Informed Graph Model and Semantic Contrastive Learning for Korean Short Text Classification
topic	Computation and Language
url	https://arxiv.org/abs/2603.03652

Similar Items