Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Yue, Hindriks, Koen V., Kunneman, Florian A.
Format:	Preprint
Published:	2024
Subjects:	Robotics Sound Audio and Speech Processing 68T50
Online Access:	https://arxiv.org/abs/2409.06274
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910597555159040
author	Li, Yue Hindriks, Koen V. Kunneman, Florian A.
author_facet	Li, Yue Hindriks, Koen V. Kunneman, Florian A.
contents	Spectral subtraction, widely used for its simplicity, has been employed to address the Robot Ego Speech Filtering (RESF) problem for detecting speech contents of human interruption from robot's single-channel microphone recordings when it is speaking. However, this approach suffers from oversubtraction in the fundamental frequency range (FFR), leading to degraded speech content recognition. To address this, we propose a Two-Mask Conformer-based Metric Generative Adversarial Network (CMGAN) to enhance the detected speech and improve recognition results. Our model compensates for oversubtracted FFR values with high-frequency information and long-term features and then de-noises the new spectrogram. In addition, we introduce an incremental processing method that allows semi-real-time audio processing with streaming input on a network trained on long fixed-length input. Evaluations of two datasets, including one with unseen noise, demonstrate significant improvements in recognition accuracy and the effectiveness of the proposed two-mask approach and incremental processing, enhancing the robustness of the proposed RESF pipeline in real-world HRI scenarios.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_06274
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Spectral oversubtraction? An approach for speech enhancement after robot ego speech filtering in semi-real-time Li, Yue Hindriks, Koen V. Kunneman, Florian A. Robotics Sound Audio and Speech Processing 68T50 Spectral subtraction, widely used for its simplicity, has been employed to address the Robot Ego Speech Filtering (RESF) problem for detecting speech contents of human interruption from robot's single-channel microphone recordings when it is speaking. However, this approach suffers from oversubtraction in the fundamental frequency range (FFR), leading to degraded speech content recognition. To address this, we propose a Two-Mask Conformer-based Metric Generative Adversarial Network (CMGAN) to enhance the detected speech and improve recognition results. Our model compensates for oversubtracted FFR values with high-frequency information and long-term features and then de-noises the new spectrogram. In addition, we introduce an incremental processing method that allows semi-real-time audio processing with streaming input on a network trained on long fixed-length input. Evaluations of two datasets, including one with unseen noise, demonstrate significant improvements in recognition accuracy and the effectiveness of the proposed two-mask approach and incremental processing, enhancing the robustness of the proposed RESF pipeline in real-world HRI scenarios.
title	Spectral oversubtraction? An approach for speech enhancement after robot ego speech filtering in semi-real-time
topic	Robotics Sound Audio and Speech Processing 68T50
url	https://arxiv.org/abs/2409.06274

Similar Items