Saved in:
Bibliographic Details
Main Authors: Wu, Yihsuan, Chiu, Yukai, Anthony, Michael, Bai, Mingsian R.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.06310
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912527673196544
author Wu, Yihsuan
Chiu, Yukai
Anthony, Michael
Bai, Mingsian R.
author_facet Wu, Yihsuan
Chiu, Yukai
Anthony, Michael
Bai, Mingsian R.
contents Drones are becoming increasingly important in search and rescue missions, and even military operations. While the majority of drones are equipped with camera vision capabilities, the realm of drone audition remains underexplored due to the inherent challenge of mitigating the egonoise generated by the rotors. In this paper, we present a novel technique to address this extremely low signal-to-noise ratio (SNR) problem encountered by the microphone-embedded drones. The technique is implemented using a hybrid approach that combines Array Signal Processing (ASP) and Deep Neural Networks (DNN) to enhance the speech signals captured by a six-microphone uniform circular array mounted on a quadcopter. The system performs localization of the target speaker through beamsteering in conjunction with speech enhancement through a Generalized Sidelobe Canceller-DeepFilterNet 2 (GSC-DF2) system. To validate the system, the DREGON dataset and measured data are employed. Objective evaluations of the proposed hybrid approach demonstrated its superior performance over four baseline methods in the SNR condition as low as -30 dB.
format Preprint
id arxiv_https___arxiv_org_abs_2508_06310
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Egonoise Resilient Source Localization and Speech Enhancement for Drones Using a Hybrid Model and Learning-Based Approach
Wu, Yihsuan
Chiu, Yukai
Anthony, Michael
Bai, Mingsian R.
Audio and Speech Processing
Drones are becoming increasingly important in search and rescue missions, and even military operations. While the majority of drones are equipped with camera vision capabilities, the realm of drone audition remains underexplored due to the inherent challenge of mitigating the egonoise generated by the rotors. In this paper, we present a novel technique to address this extremely low signal-to-noise ratio (SNR) problem encountered by the microphone-embedded drones. The technique is implemented using a hybrid approach that combines Array Signal Processing (ASP) and Deep Neural Networks (DNN) to enhance the speech signals captured by a six-microphone uniform circular array mounted on a quadcopter. The system performs localization of the target speaker through beamsteering in conjunction with speech enhancement through a Generalized Sidelobe Canceller-DeepFilterNet 2 (GSC-DF2) system. To validate the system, the DREGON dataset and measured data are employed. Objective evaluations of the proposed hybrid approach demonstrated its superior performance over four baseline methods in the SNR condition as low as -30 dB.
title Egonoise Resilient Source Localization and Speech Enhancement for Drones Using a Hybrid Model and Learning-Based Approach
topic Audio and Speech Processing
url https://arxiv.org/abs/2508.06310