Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Yutong, Xiao, Feng, Zhang, Ziyue, Yu, Yongbin, Huang, Cheng, Gao, Fan, Wang, Xiangxiang, Ban, Ma-bao, Fan, Manping, Tsering, Thupten, Luosang, Gadeng, Duojie, Renzeng, Tashi, Nyima
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2505.08037
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Multi-level Tibetan spelling correction addresses errors at both the character and syllable levels within a unified model. Existing methods focus mainly on single-level correction and lack effective integration of both levels. Moreover, there are no open-source datasets or augmentation methods tailored for this task in Tibetan. To tackle this, we propose a data augmentation approach using unlabeled text to generate multi-level corruptions, and introduce TiSpell, a semi-masked model capable of correcting both character- and syllable-level errors. Although syllable-level correction is more challenging due to its reliance on global context, our semi-masked strategy simplifies this process. We synthesize nine types of corruptions on clean sentences to create a robust training set. Experiments on both simulated and real-world data demonstrate that TiSpell, trained on our dataset, outperforms baseline models and matches the performance of state-of-the-art approaches, confirming its effectiveness.

Similar Items