Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Zhipeng, Xing, Xiaofen, Wang, Jun, Chen, Shuaiqi, Yu, Guoqiao, Wan, Guanglu, Xu, Xiangmin
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2409.05730
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913494179250176
author	Li, Zhipeng Xing, Xiaofen Wang, Jun Chen, Shuaiqi Yu, Guoqiao Wan, Guanglu Xu, Xiangmin
author_facet	Li, Zhipeng Xing, Xiaofen Wang, Jun Chen, Shuaiqi Yu, Guoqiao Wan, Guanglu Xu, Xiangmin
contents	In recent years, there has been significant progress in Text-to-Speech (TTS) synthesis technology, enabling the high-quality synthesis of voices in common scenarios. In unseen situations, adaptive TTS requires a strong generalization capability to speaker style characteristics. However, the existing adaptive methods can only extract and integrate coarse-grained timbre or mixed rhythm attributes separately. In this paper, we propose AS-Speech, an adaptive style methodology that integrates the speaker timbre characteristics and rhythmic attributes into a unified framework for text-to-speech synthesis. Specifically, AS-Speech can accurately simulate style characteristics through fine-grained text-based timbre features and global rhythm information, and achieve high-fidelity speech synthesis through the diffusion model. Experiments show that the proposed model produces voices with higher naturalness and similarity in terms of timbre and rhythm compared to a series of adaptive TTS models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_05730
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	AS-Speech: Adaptive Style For Speech Synthesis Li, Zhipeng Xing, Xiaofen Wang, Jun Chen, Shuaiqi Yu, Guoqiao Wan, Guanglu Xu, Xiangmin Audio and Speech Processing In recent years, there has been significant progress in Text-to-Speech (TTS) synthesis technology, enabling the high-quality synthesis of voices in common scenarios. In unseen situations, adaptive TTS requires a strong generalization capability to speaker style characteristics. However, the existing adaptive methods can only extract and integrate coarse-grained timbre or mixed rhythm attributes separately. In this paper, we propose AS-Speech, an adaptive style methodology that integrates the speaker timbre characteristics and rhythmic attributes into a unified framework for text-to-speech synthesis. Specifically, AS-Speech can accurately simulate style characteristics through fine-grained text-based timbre features and global rhythm information, and achieve high-fidelity speech synthesis through the diffusion model. Experiments show that the proposed model produces voices with higher naturalness and similarity in terms of timbre and rhythm compared to a series of adaptive TTS models.
title	AS-Speech: Adaptive Style For Speech Synthesis
topic	Audio and Speech Processing
url	https://arxiv.org/abs/2409.05730

Similar Items