Saved in:
Bibliographic Details
Main Authors: Yoo, JaeGeon, Kim, Byoungwook, Yang, Yeongwook, Jang, Hong-Jun
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.03652
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912942026391552
author Yoo, JaeGeon
Kim, Byoungwook
Yang, Yeongwook
Jang, Hong-Jun
author_facet Yoo, JaeGeon
Kim, Byoungwook
Yang, Yeongwook
Jang, Hong-Jun
contents Short text classification (STC) remains a challenging task due to the scarcity of contextual information and labeled data. However, existing approaches have pre-dominantly focused on English because most benchmark datasets for the STC are primarily available in English. Consequently, existing methods seldom incorporate the linguistic and structural characteristics of Korean, such as its agglutinative morphology and flexible word order. To address these limitations, we propose LIGRAM, a hierarchical heterogeneous graph model for Korean short-text classification. The proposed model constructs sub-graphs at the morpheme, part-of-speech, and named-entity levels and hierarchically integrates them to compensate for the limited contextual information in short texts while precisely capturing the grammatical and semantic dependencies inherent in Korean. In addition, we apply Semantics-aware Contrastive Learning (SemCon) to reflect semantic similarity across documents, enabling the model to establish clearer decision boundaries even in short texts where class distinctions are often ambiguous. We evaluate LIGRAM on four Korean short-text datasets, where it consistently outperforms existing baseline models. These outcomes validate that integrating language-specific graph representations with SemCon provides an effective solution for short text classification in agglutinative languages such as Korean.
format Preprint
id arxiv_https___arxiv_org_abs_2603_03652
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Linguistically Informed Graph Model and Semantic Contrastive Learning for Korean Short Text Classification
Yoo, JaeGeon
Kim, Byoungwook
Yang, Yeongwook
Jang, Hong-Jun
Computation and Language
Short text classification (STC) remains a challenging task due to the scarcity of contextual information and labeled data. However, existing approaches have pre-dominantly focused on English because most benchmark datasets for the STC are primarily available in English. Consequently, existing methods seldom incorporate the linguistic and structural characteristics of Korean, such as its agglutinative morphology and flexible word order. To address these limitations, we propose LIGRAM, a hierarchical heterogeneous graph model for Korean short-text classification. The proposed model constructs sub-graphs at the morpheme, part-of-speech, and named-entity levels and hierarchically integrates them to compensate for the limited contextual information in short texts while precisely capturing the grammatical and semantic dependencies inherent in Korean. In addition, we apply Semantics-aware Contrastive Learning (SemCon) to reflect semantic similarity across documents, enabling the model to establish clearer decision boundaries even in short texts where class distinctions are often ambiguous. We evaluate LIGRAM on four Korean short-text datasets, where it consistently outperforms existing baseline models. These outcomes validate that integrating language-specific graph representations with SemCon provides an effective solution for short text classification in agglutinative languages such as Korean.
title Linguistically Informed Graph Model and Semantic Contrastive Learning for Korean Short Text Classification
topic Computation and Language
url https://arxiv.org/abs/2603.03652