Saved in:
Bibliographic Details
Main Authors: Zhang, Shiqi, Qiu, Zheng, Takeuchi, Daiki, Harada, Noboru, Makino, Shoji
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.08252
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • With the rapid development of neural networks in recent years, the ability of various networks to enhance the magnitude spectrum of noisy speech in the single-channel speech enhancement domain has become exceptionally outstanding. However, enhancing the phase spectrum using neural networks is often ineffective, which remains a challenging problem. In this paper, we found that the human ear cannot sensitively perceive the difference between a precise phase spectrum and a biased phase (BP) spectrum. Therefore, we propose an optimization method of phase reconstruction, allowing freedom on the global-phase bias instead of reconstructing the precise phase spectrum. We applied it to a Conformer-based Metric Generative Adversarial Networks (CMGAN) baseline model, which relaxes the existing constraints of precise phase and gives the neural network a broader learning space. Results show that this method achieves a new state-of-the-art performance without incurring additional computational overhead.