Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Büthe, Jan, Mustafa, Ahmed, Valin, Jean-Marc, Helwani, Karim, Goodwin, Michael M.
Format:	Preprint
Published:	2023
Subjects:	Audio and Speech Processing Sound
Online Access:	https://arxiv.org/abs/2309.14521
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910294917251072
author	Büthe, Jan Mustafa, Ahmed Valin, Jean-Marc Helwani, Karim Goodwin, Michael M.
author_facet	Büthe, Jan Mustafa, Ahmed Valin, Jean-Marc Helwani, Karim Goodwin, Michael M.
contents	Speech codec enhancement methods are designed to remove distortions added by speech codecs. While classical methods are very low in complexity and add zero delay, their effectiveness is rather limited. Compared to that, DNN-based methods deliver higher quality but they are typically high in complexity and/or require delay. The recently proposed Linear Adaptive Coding Enhancer (LACE) addresses this problem by combining DNNs with classical long-term/short-term postfiltering resulting in a causal low-complexity model. A short-coming of the LACE model is, however, that quality quickly saturates when the model size is scaled up. To mitigate this problem, we propose a novel adatpive temporal shaping module that adds high temporal resolution to the LACE model resulting in the Non-Linear Adaptive Coding Enhancer (NoLACE). We adapt NoLACE to enhance the Opus codec and show that NoLACE significantly outperforms both the Opus baseline and an enlarged LACE model at 6, 9 and 12 kb/s. We also show that LACE and NoLACE are well-behaved when used with an ASR system.
format	Preprint
id	arxiv_https___arxiv_org_abs_2309_14521
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal Shaping Büthe, Jan Mustafa, Ahmed Valin, Jean-Marc Helwani, Karim Goodwin, Michael M. Audio and Speech Processing Sound Speech codec enhancement methods are designed to remove distortions added by speech codecs. While classical methods are very low in complexity and add zero delay, their effectiveness is rather limited. Compared to that, DNN-based methods deliver higher quality but they are typically high in complexity and/or require delay. The recently proposed Linear Adaptive Coding Enhancer (LACE) addresses this problem by combining DNNs with classical long-term/short-term postfiltering resulting in a causal low-complexity model. A short-coming of the LACE model is, however, that quality quickly saturates when the model size is scaled up. To mitigate this problem, we propose a novel adatpive temporal shaping module that adds high temporal resolution to the LACE model resulting in the Non-Linear Adaptive Coding Enhancer (NoLACE). We adapt NoLACE to enhance the Opus codec and show that NoLACE significantly outperforms both the Opus baseline and an enlarged LACE model at 6, 9 and 12 kb/s. We also show that LACE and NoLACE are well-behaved when used with an ASR system.
title	NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal Shaping
topic	Audio and Speech Processing Sound
url	https://arxiv.org/abs/2309.14521

Similar Items