Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Fu, Linya, Liu, Yu, Liu, Zhijie, Yang, Zedong, Wang, Zhong-Qiu, Li, Youfu, Kong, He
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Sound
Online Access:	https://arxiv.org/abs/2506.02773
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913872260104192
author	Fu, Linya Liu, Yu Liu, Zhijie Yang, Zedong Wang, Zhong-Qiu Li, Youfu Kong, He
author_facet	Fu, Linya Liu, Yu Liu, Zhijie Yang, Zedong Wang, Zhong-Qiu Li, Youfu Kong, He
contents	We propose AuralNet, a novel 3D multi-source binaural sound source localization approach that localizes overlapping sources in both azimuth and elevation without prior knowledge of the number of sources. AuralNet employs a gated coarse-tofine architecture, combining a coarse classification stage with a fine-grained regression stage, allowing for flexible spatial resolution through sector partitioning. The model incorporates a multi-head self-attention mechanism to capture spatial cues in binaural signals, enhancing robustness in noisy-reverberant environments. A masked multi-task loss function is designed to jointly optimize sound detection, azimuth, and elevation estimation. Extensive experiments in noisy-reverberant conditions demonstrate the superiority of AuralNet over recent methods
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_02773
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers Fu, Linya Liu, Yu Liu, Zhijie Yang, Zedong Wang, Zhong-Qiu Li, Youfu Kong, He Audio and Speech Processing Sound We propose AuralNet, a novel 3D multi-source binaural sound source localization approach that localizes overlapping sources in both azimuth and elevation without prior knowledge of the number of sources. AuralNet employs a gated coarse-tofine architecture, combining a coarse classification stage with a fine-grained regression stage, allowing for flexible spatial resolution through sector partitioning. The model incorporates a multi-head self-attention mechanism to capture spatial cues in binaural signals, enhancing robustness in noisy-reverberant environments. A masked multi-task loss function is designed to jointly optimize sound detection, azimuth, and elevation estimation. Extensive experiments in noisy-reverberant conditions demonstrate the superiority of AuralNet over recent methods
title	AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers
topic	Audio and Speech Processing Sound
url	https://arxiv.org/abs/2506.02773

Similar Items