Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Cheng, Gaofeng, Lu, Haitian, Yang, Chengxu, Wang, Xuyang, Li, Ta, Yan, Yonghong
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Computation and Language
Online Access:	https://arxiv.org/abs/2501.00804
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929654355460096
author	Cheng, Gaofeng Lu, Haitian Yang, Chengxu Wang, Xuyang Li, Ta Yan, Yonghong
author_facet	Cheng, Gaofeng Lu, Haitian Yang, Chengxu Wang, Xuyang Li, Ta Yan, Yonghong
contents	Effectively distinguishing the pronunciation correlations between different written texts is a significant issue in linguistic acoustics. Traditionally, such pronunciation correlations are obtained through manually designed pronunciation lexicons. In this paper, we propose a data-driven method to automatically acquire these pronunciation correlations, called automatic text pronunciation correlation (ATPC). The supervision required for this method is consistent with the supervision needed for training end-to-end automatic speech recognition (E2E-ASR) systems, i.e., speech and corresponding text annotations. First, the iteratively-trained timestamp estimator (ITSE) algorithm is employed to align the speech with their corresponding annotated text symbols. Then, a speech encoder is used to convert the speech into speech embeddings. Finally, we compare the speech embeddings distances of different text symbols to obtain ATPC. Experimental results on Mandarin show that ATPC enhances E2E-ASR performance in contextual biasing and holds promise for dialects or languages lacking artificial pronunciation lexicons.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_00804
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Automatic Text Pronunciation Correlation Generation and Application for Contextual Biasing Cheng, Gaofeng Lu, Haitian Yang, Chengxu Wang, Xuyang Li, Ta Yan, Yonghong Audio and Speech Processing Computation and Language Effectively distinguishing the pronunciation correlations between different written texts is a significant issue in linguistic acoustics. Traditionally, such pronunciation correlations are obtained through manually designed pronunciation lexicons. In this paper, we propose a data-driven method to automatically acquire these pronunciation correlations, called automatic text pronunciation correlation (ATPC). The supervision required for this method is consistent with the supervision needed for training end-to-end automatic speech recognition (E2E-ASR) systems, i.e., speech and corresponding text annotations. First, the iteratively-trained timestamp estimator (ITSE) algorithm is employed to align the speech with their corresponding annotated text symbols. Then, a speech encoder is used to convert the speech into speech embeddings. Finally, we compare the speech embeddings distances of different text symbols to obtain ATPC. Experimental results on Mandarin show that ATPC enhances E2E-ASR performance in contextual biasing and holds promise for dialects or languages lacking artificial pronunciation lexicons.
title	Automatic Text Pronunciation Correlation Generation and Application for Contextual Biasing
topic	Audio and Speech Processing Computation and Language
url	https://arxiv.org/abs/2501.00804

Similar Items