Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Shen, Rubing, Ren, Yanzhen, Sun, Zongkun
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2407.04575
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916313594593280
author	Shen, Rubing Ren, Yanzhen Sun, Zongkun
author_facet	Shen, Rubing Ren, Yanzhen Sun, Zongkun
contents	Generative adversarial network (GAN) based vocoders have achieved significant attention in speech synthesis with high quality and fast inference speed. However, there still exist many noticeable spectral artifacts, resulting in the quality decline of synthesized speech. In this work, we adopt a novel GAN-based vocoder designed for few artifacts and high fidelity, called FA-GAN. To suppress the aliasing artifacts caused by non-ideal upsampling layers in high-frequency components, we introduce the anti-aliased twin deconvolution module in the generator. To alleviate blurring artifacts and enrich the reconstruction of spectral details, we propose a novel fine-grained multi-resolution real and imaginary loss to assist in the modeling of phase information. Experimental results reveal that FA-GAN outperforms the compared approaches in promoting audio quality and alleviating spectral artifacts, and exhibits superior performance when applied to unseen speaker scenarios.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_04575
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	FA-GAN: Artifacts-free and Phase-aware High-fidelity GAN-based Vocoder Shen, Rubing Ren, Yanzhen Sun, Zongkun Audio and Speech Processing Generative adversarial network (GAN) based vocoders have achieved significant attention in speech synthesis with high quality and fast inference speed. However, there still exist many noticeable spectral artifacts, resulting in the quality decline of synthesized speech. In this work, we adopt a novel GAN-based vocoder designed for few artifacts and high fidelity, called FA-GAN. To suppress the aliasing artifacts caused by non-ideal upsampling layers in high-frequency components, we introduce the anti-aliased twin deconvolution module in the generator. To alleviate blurring artifacts and enrich the reconstruction of spectral details, we propose a novel fine-grained multi-resolution real and imaginary loss to assist in the modeling of phase information. Experimental results reveal that FA-GAN outperforms the compared approaches in promoting audio quality and alleviating spectral artifacts, and exhibits superior performance when applied to unseen speaker scenarios.
title	FA-GAN: Artifacts-free and Phase-aware High-fidelity GAN-based Vocoder
topic	Audio and Speech Processing
url	https://arxiv.org/abs/2407.04575

Similar Items