Saved in:
Bibliographic Details
Main Authors: Yu, Guochen, Han, Runqiang, Xu, Chenglin, Zhao, Haoran, Li, Nan, Zhang, Chen, Zheng, Xiguang, Zhou, Chao, Huang, Qi, Yu, Bing
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.01808
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910317664010240
author Yu, Guochen
Han, Runqiang
Xu, Chenglin
Zhao, Haoran
Li, Nan
Zhang, Chen
Zheng, Xiguang
Zhou, Chao
Huang, Qi
Yu, Bing
author_facet Yu, Guochen
Han, Runqiang
Xu, Chenglin
Zhao, Haoran
Li, Nan
Zhang, Chen
Zheng, Xiguang
Zhou, Chao
Huang, Qi
Yu, Bing
contents This paper presents the speech restoration and enhancement system created by the 1024K team for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. Our system consists of a generative adversarial network (GAN) in complex-domain for speech restoration and a fine-grained multi-band fusion module for speech enhancement. In the blind test set of SSI, the proposed system achieves an overall mean opinion score (MOS) of 3.49 based on ITU-T P.804 and a Word Accuracy Rate (WAcc) of 0.78 for the real-time track, as well as an overall P.804 MOS of 3.43 and a WAcc of 0.78 for the non-real-time track, ranking 1st in both tracks.
format Preprint
id arxiv_https___arxiv_org_abs_2402_01808
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle KS-Net: Multi-band joint speech restoration and enhancement network for 2024 ICASSP SSI Challenge
Yu, Guochen
Han, Runqiang
Xu, Chenglin
Zhao, Haoran
Li, Nan
Zhang, Chen
Zheng, Xiguang
Zhou, Chao
Huang, Qi
Yu, Bing
Sound
Audio and Speech Processing
This paper presents the speech restoration and enhancement system created by the 1024K team for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. Our system consists of a generative adversarial network (GAN) in complex-domain for speech restoration and a fine-grained multi-band fusion module for speech enhancement. In the blind test set of SSI, the proposed system achieves an overall mean opinion score (MOS) of 3.49 based on ITU-T P.804 and a Word Accuracy Rate (WAcc) of 0.78 for the real-time track, as well as an overall P.804 MOS of 3.43 and a WAcc of 0.78 for the non-real-time track, ranking 1st in both tracks.
title KS-Net: Multi-band joint speech restoration and enhancement network for 2024 ICASSP SSI Challenge
topic Sound
Audio and Speech Processing
url https://arxiv.org/abs/2402.01808