Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xu, Nan, Huang, Zhaolong, Zhi, Xiaonan
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2505.13029
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914122948411392
author	Xu, Nan Huang, Zhaolong Zhi, Xiaonan
author_facet	Xu, Nan Huang, Zhaolong Zhi, Xiaonan
contents	With the development of deep learning, speech enhancement has been greatly optimized in terms of speech quality. Previous methods typically focus on the discriminative supervised learning or generative modeling, which tends to introduce speech distortions or high computational cost. In this paper, we propose MDDM, a Multi-view Discriminative enhanced Diffusion-based Model. Specifically, we take the features of three domains (time, frequency and noise) as inputs of a discriminative prediction network, generating the preliminary spectrogram. Then, the discriminative output can be converted to clean speech by several inference sampling steps. Due to the intersection of the distributions between discriminative output and clean target, the smaller sampling steps can achieve the competitive performance compared to other diffusion-based methods. Experiments conducted on a public dataset and a realworld dataset validate the effectiveness of MDDM, either on subjective or objective metric.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_13029
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	MDDM: A Multi-view Discriminative Enhanced Diffusion-based Model for Speech Enhancement Xu, Nan Huang, Zhaolong Zhi, Xiaonan Audio and Speech Processing With the development of deep learning, speech enhancement has been greatly optimized in terms of speech quality. Previous methods typically focus on the discriminative supervised learning or generative modeling, which tends to introduce speech distortions or high computational cost. In this paper, we propose MDDM, a Multi-view Discriminative enhanced Diffusion-based Model. Specifically, we take the features of three domains (time, frequency and noise) as inputs of a discriminative prediction network, generating the preliminary spectrogram. Then, the discriminative output can be converted to clean speech by several inference sampling steps. Due to the intersection of the distributions between discriminative output and clean target, the smaller sampling steps can achieve the competitive performance compared to other diffusion-based methods. Experiments conducted on a public dataset and a realworld dataset validate the effectiveness of MDDM, either on subjective or objective metric.
title	MDDM: A Multi-view Discriminative Enhanced Diffusion-based Model for Speech Enhancement
topic	Audio and Speech Processing
url	https://arxiv.org/abs/2505.13029

Similar Items