Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Xingchen, Xie, Hanke, Wang, Ziqian, Zhang, Zihan, Xiao, Longshuai, Wang, Shuai, Xie, Lei
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2509.24708
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914444061179904
author	Li, Xingchen Xie, Hanke Wang, Ziqian Zhang, Zihan Xiao, Longshuai Wang, Shuai Xie, Lei
author_facet	Li, Xingchen Xie, Hanke Wang, Ziqian Zhang, Zihan Xiao, Longshuai Wang, Shuai Xie, Lei
contents	Generative Universal Speech Enhancement (USE) methods aim to leverage generative models to improve speech quality under various types of distortions. However, existing generative speech enhancement methods often suffer from semantic inconsistency in the generated outputs. Therefore, we propose SenSE, a novel two-stage generative universal speech enhancement framework, by modeling semantic priors with a language model, the flow matching-based speech enhancement process is guided to generate semantically faithful speech, thereby effectively improving context fidelity. In addition, we introduce a dual-path masked conditioning training strategy that enables flow matching-based enhancement to flexibly integrate multi-source conditioning signals from degraded speech, semantic tokens, and reference speech, thereby improving model flexibility and adaptability. Experimental results demonstrate that SenSE achieves state-of-the-art performance among generative speech enhancement models and exhibits a high performance ceiling, particularly under challenging distortion conditions. Codes and demos are available at https://github.com/ASLP-lab/SenSE.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_24708
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	SenSE: Semantic-Aware High-Fidelity Universal Speech Enhancement Li, Xingchen Xie, Hanke Wang, Ziqian Zhang, Zihan Xiao, Longshuai Wang, Shuai Xie, Lei Audio and Speech Processing Generative Universal Speech Enhancement (USE) methods aim to leverage generative models to improve speech quality under various types of distortions. However, existing generative speech enhancement methods often suffer from semantic inconsistency in the generated outputs. Therefore, we propose SenSE, a novel two-stage generative universal speech enhancement framework, by modeling semantic priors with a language model, the flow matching-based speech enhancement process is guided to generate semantically faithful speech, thereby effectively improving context fidelity. In addition, we introduce a dual-path masked conditioning training strategy that enables flow matching-based enhancement to flexibly integrate multi-source conditioning signals from degraded speech, semantic tokens, and reference speech, thereby improving model flexibility and adaptability. Experimental results demonstrate that SenSE achieves state-of-the-art performance among generative speech enhancement models and exhibits a high performance ceiling, particularly under challenging distortion conditions. Codes and demos are available at https://github.com/ASLP-lab/SenSE.
title	SenSE: Semantic-Aware High-Fidelity Universal Speech Enhancement
topic	Audio and Speech Processing
url	https://arxiv.org/abs/2509.24708

Similar Items