Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Salhab, Mahmoud, Harmanani, Haidar
Formato:	Preprint
Publicado:	2024
Materias:	Sound Artificial Intelligence Audio and Speech Processing
Acceso en línea:	https://arxiv.org/abs/2407.18571
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866913450139058176
author	Salhab, Mahmoud Harmanani, Haidar
author_facet	Salhab, Mahmoud Harmanani, Haidar
contents	Speech bandwidth expansion is crucial for expanding the frequency range of low-bandwidth speech signals, thereby improving audio quality, clarity and perceptibility in digital applications. Its applications span telephony, compression, text-to-speech synthesis, and speech recognition. This paper presents a novel approach using a high-fidelity generative adversarial network, unlike cascaded systems, our system is trained end-to-end on paired narrowband and wideband speech signals. Our method integrates various bandwidth upsampling ratios into a single unified model specifically designed for speech bandwidth expansion applications. Our approach exhibits robust performance across various bandwidth expansion factors, including those not encountered during training, demonstrating zero-shot capability. To the best of our knowledge, this is the first work to showcase this capability. The experimental results demonstrate that our method outperforms previous end-to-end approaches, as well as interpolation and traditional techniques, showcasing its effectiveness in practical speech enhancement applications.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_18571
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks Salhab, Mahmoud Harmanani, Haidar Sound Artificial Intelligence Audio and Speech Processing Speech bandwidth expansion is crucial for expanding the frequency range of low-bandwidth speech signals, thereby improving audio quality, clarity and perceptibility in digital applications. Its applications span telephony, compression, text-to-speech synthesis, and speech recognition. This paper presents a novel approach using a high-fidelity generative adversarial network, unlike cascaded systems, our system is trained end-to-end on paired narrowband and wideband speech signals. Our method integrates various bandwidth upsampling ratios into a single unified model specifically designed for speech bandwidth expansion applications. Our approach exhibits robust performance across various bandwidth expansion factors, including those not encountered during training, demonstrating zero-shot capability. To the best of our knowledge, this is the first work to showcase this capability. The experimental results demonstrate that our method outperforms previous end-to-end approaches, as well as interpolation and traditional techniques, showcasing its effectiveness in practical speech enhancement applications.
title	Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks
topic	Sound Artificial Intelligence Audio and Speech Processing
url	https://arxiv.org/abs/2407.18571

Ejemplares similares