Saved in:
Bibliographic Details
Main Authors: Büthe, Jan, Mustafa, Ahmed, Valin, Jean-Marc, Helwani, Karim, Goodwin, Michael M.
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2309.14521
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910294917251072
author Büthe, Jan
Mustafa, Ahmed
Valin, Jean-Marc
Helwani, Karim
Goodwin, Michael M.
author_facet Büthe, Jan
Mustafa, Ahmed
Valin, Jean-Marc
Helwani, Karim
Goodwin, Michael M.
contents Speech codec enhancement methods are designed to remove distortions added by speech codecs. While classical methods are very low in complexity and add zero delay, their effectiveness is rather limited. Compared to that, DNN-based methods deliver higher quality but they are typically high in complexity and/or require delay. The recently proposed Linear Adaptive Coding Enhancer (LACE) addresses this problem by combining DNNs with classical long-term/short-term postfiltering resulting in a causal low-complexity model. A short-coming of the LACE model is, however, that quality quickly saturates when the model size is scaled up. To mitigate this problem, we propose a novel adatpive temporal shaping module that adds high temporal resolution to the LACE model resulting in the Non-Linear Adaptive Coding Enhancer (NoLACE). We adapt NoLACE to enhance the Opus codec and show that NoLACE significantly outperforms both the Opus baseline and an enlarged LACE model at 6, 9 and 12 kb/s. We also show that LACE and NoLACE are well-behaved when used with an ASR system.
format Preprint
id arxiv_https___arxiv_org_abs_2309_14521
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal Shaping
Büthe, Jan
Mustafa, Ahmed
Valin, Jean-Marc
Helwani, Karim
Goodwin, Michael M.
Audio and Speech Processing
Sound
Speech codec enhancement methods are designed to remove distortions added by speech codecs. While classical methods are very low in complexity and add zero delay, their effectiveness is rather limited. Compared to that, DNN-based methods deliver higher quality but they are typically high in complexity and/or require delay. The recently proposed Linear Adaptive Coding Enhancer (LACE) addresses this problem by combining DNNs with classical long-term/short-term postfiltering resulting in a causal low-complexity model. A short-coming of the LACE model is, however, that quality quickly saturates when the model size is scaled up. To mitigate this problem, we propose a novel adatpive temporal shaping module that adds high temporal resolution to the LACE model resulting in the Non-Linear Adaptive Coding Enhancer (NoLACE). We adapt NoLACE to enhance the Opus codec and show that NoLACE significantly outperforms both the Opus baseline and an enlarged LACE model at 6, 9 and 12 kb/s. We also show that LACE and NoLACE are well-behaved when used with an ASR system.
title NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal Shaping
topic Audio and Speech Processing
Sound
url https://arxiv.org/abs/2309.14521