Saved in:
| Main Authors: | , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.01808 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866910317664010240 |
|---|---|
| author | Yu, Guochen Han, Runqiang Xu, Chenglin Zhao, Haoran Li, Nan Zhang, Chen Zheng, Xiguang Zhou, Chao Huang, Qi Yu, Bing |
| author_facet | Yu, Guochen Han, Runqiang Xu, Chenglin Zhao, Haoran Li, Nan Zhang, Chen Zheng, Xiguang Zhou, Chao Huang, Qi Yu, Bing |
| contents | This paper presents the speech restoration and enhancement system created by the 1024K team for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. Our system consists of a generative adversarial network (GAN) in complex-domain for speech restoration and a fine-grained multi-band fusion module for speech enhancement. In the blind test set of SSI, the proposed system achieves an overall mean opinion score (MOS) of 3.49 based on ITU-T P.804 and a Word Accuracy Rate (WAcc) of 0.78 for the real-time track, as well as an overall P.804 MOS of 3.43 and a WAcc of 0.78 for the non-real-time track, ranking 1st in both tracks. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2402_01808 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | KS-Net: Multi-band joint speech restoration and enhancement network for 2024 ICASSP SSI Challenge Yu, Guochen Han, Runqiang Xu, Chenglin Zhao, Haoran Li, Nan Zhang, Chen Zheng, Xiguang Zhou, Chao Huang, Qi Yu, Bing Sound Audio and Speech Processing This paper presents the speech restoration and enhancement system created by the 1024K team for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. Our system consists of a generative adversarial network (GAN) in complex-domain for speech restoration and a fine-grained multi-band fusion module for speech enhancement. In the blind test set of SSI, the proposed system achieves an overall mean opinion score (MOS) of 3.49 based on ITU-T P.804 and a Word Accuracy Rate (WAcc) of 0.78 for the real-time track, as well as an overall P.804 MOS of 3.43 and a WAcc of 0.78 for the non-real-time track, ranking 1st in both tracks. |
| title | KS-Net: Multi-band joint speech restoration and enhancement network for 2024 ICASSP SSI Challenge |
| topic | Sound Audio and Speech Processing |
| url | https://arxiv.org/abs/2402.01808 |