Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.02773 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866913872260104192 |
|---|---|
| author | Fu, Linya Liu, Yu Liu, Zhijie Yang, Zedong Wang, Zhong-Qiu Li, Youfu Kong, He |
| author_facet | Fu, Linya Liu, Yu Liu, Zhijie Yang, Zedong Wang, Zhong-Qiu Li, Youfu Kong, He |
| contents | We propose AuralNet, a novel 3D multi-source binaural sound source localization approach that localizes overlapping sources in both azimuth and elevation without prior knowledge of the number of sources. AuralNet employs a gated coarse-tofine architecture, combining a coarse classification stage with a fine-grained regression stage, allowing for flexible spatial resolution through sector partitioning. The model incorporates a multi-head self-attention mechanism to capture spatial cues in binaural signals, enhancing robustness in noisy-reverberant environments. A masked multi-task loss function is designed to jointly optimize sound detection, azimuth, and elevation estimation. Extensive experiments in noisy-reverberant conditions demonstrate the superiority of AuralNet over recent methods |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2506_02773 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers Fu, Linya Liu, Yu Liu, Zhijie Yang, Zedong Wang, Zhong-Qiu Li, Youfu Kong, He Audio and Speech Processing Sound We propose AuralNet, a novel 3D multi-source binaural sound source localization approach that localizes overlapping sources in both azimuth and elevation without prior knowledge of the number of sources. AuralNet employs a gated coarse-tofine architecture, combining a coarse classification stage with a fine-grained regression stage, allowing for flexible spatial resolution through sector partitioning. The model incorporates a multi-head self-attention mechanism to capture spatial cues in binaural signals, enhancing robustness in noisy-reverberant environments. A masked multi-task loss function is designed to jointly optimize sound detection, azimuth, and elevation estimation. Extensive experiments in noisy-reverberant conditions demonstrate the superiority of AuralNet over recent methods |
| title | AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers |
| topic | Audio and Speech Processing Sound |
| url | https://arxiv.org/abs/2506.02773 |