Saved in:
Bibliographic Details
Main Authors: Fu, Linya, Liu, Yu, Liu, Zhijie, Yang, Zedong, Wang, Zhong-Qiu, Li, Youfu, Kong, He
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2506.02773
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913872260104192
author Fu, Linya
Liu, Yu
Liu, Zhijie
Yang, Zedong
Wang, Zhong-Qiu
Li, Youfu
Kong, He
author_facet Fu, Linya
Liu, Yu
Liu, Zhijie
Yang, Zedong
Wang, Zhong-Qiu
Li, Youfu
Kong, He
contents We propose AuralNet, a novel 3D multi-source binaural sound source localization approach that localizes overlapping sources in both azimuth and elevation without prior knowledge of the number of sources. AuralNet employs a gated coarse-tofine architecture, combining a coarse classification stage with a fine-grained regression stage, allowing for flexible spatial resolution through sector partitioning. The model incorporates a multi-head self-attention mechanism to capture spatial cues in binaural signals, enhancing robustness in noisy-reverberant environments. A masked multi-task loss function is designed to jointly optimize sound detection, azimuth, and elevation estimation. Extensive experiments in noisy-reverberant conditions demonstrate the superiority of AuralNet over recent methods
format Preprint
id arxiv_https___arxiv_org_abs_2506_02773
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers
Fu, Linya
Liu, Yu
Liu, Zhijie
Yang, Zedong
Wang, Zhong-Qiu
Li, Youfu
Kong, He
Audio and Speech Processing
Sound
We propose AuralNet, a novel 3D multi-source binaural sound source localization approach that localizes overlapping sources in both azimuth and elevation without prior knowledge of the number of sources. AuralNet employs a gated coarse-tofine architecture, combining a coarse classification stage with a fine-grained regression stage, allowing for flexible spatial resolution through sector partitioning. The model incorporates a multi-head self-attention mechanism to capture spatial cues in binaural signals, enhancing robustness in noisy-reverberant environments. A masked multi-task loss function is designed to jointly optimize sound detection, azimuth, and elevation estimation. Extensive experiments in noisy-reverberant conditions demonstrate the superiority of AuralNet over recent methods
title AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers
topic Audio and Speech Processing
Sound
url https://arxiv.org/abs/2506.02773