MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Shi, Yi, Wang, Congyi, Chen, Yu, Wang, Bin
Natura:	Preprint
Pubblicazione:	2021
Soggetti:	Computation and Language Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2102.00621
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866916357471207424
author	Shi, Yi Wang, Congyi Chen, Yu Wang, Bin
author_facet	Shi, Yi Wang, Congyi Chen, Yu Wang, Bin
contents	The majority of Chinese characters are monophonic, while a special group of characters, called polyphonic characters, have multiple pronunciations. As a prerequisite of performing speech-related generative tasks, the correct pronunciation must be identified among several candidates. This process is called Polyphone Disambiguation. Although the problem has been well explored with both knowledge-based and learning-based approaches, it remains challenging due to the lack of publicly available labeled datasets and the irregular nature of polyphone in Mandarin Chinese. In this paper, we propose a novel semi-supervised learning (SSL) framework for Mandarin Chinese polyphone disambiguation that can potentially leverage unlimited unlabeled text data. We explore the effect of various proxy labeling strategies including entropy-thresholding and lexicon-based labeling. Qualitative and quantitative experiments demonstrate that our method achieves state-of-the-art performance. In addition, we publish a novel dataset specifically for the polyphone disambiguation task to promote further research.
format	Preprint
id	arxiv_https___arxiv_org_abs_2102_00621
institution	arXiv
publishDate	2021
record_format	arxiv
spellingShingle	Polyphone Disambiguation in Mandarin Chinese with Semi-Supervised Learning Shi, Yi Wang, Congyi Chen, Yu Wang, Bin Computation and Language Artificial Intelligence The majority of Chinese characters are monophonic, while a special group of characters, called polyphonic characters, have multiple pronunciations. As a prerequisite of performing speech-related generative tasks, the correct pronunciation must be identified among several candidates. This process is called Polyphone Disambiguation. Although the problem has been well explored with both knowledge-based and learning-based approaches, it remains challenging due to the lack of publicly available labeled datasets and the irregular nature of polyphone in Mandarin Chinese. In this paper, we propose a novel semi-supervised learning (SSL) framework for Mandarin Chinese polyphone disambiguation that can potentially leverage unlimited unlabeled text data. We explore the effect of various proxy labeling strategies including entropy-thresholding and lexicon-based labeling. Qualitative and quantitative experiments demonstrate that our method achieves state-of-the-art performance. In addition, we publish a novel dataset specifically for the polyphone disambiguation task to promote further research.
title	Polyphone Disambiguation in Mandarin Chinese with Semi-Supervised Learning
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2102.00621

Documenti analoghi