Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Eisenberg, Aviad, Gannot, Sharon, Chazan, Shlomo E.
Format:	Preprint
Published:	2025
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.06285
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913685092433920
author	Eisenberg, Aviad Gannot, Sharon Chazan, Shlomo E.
author_facet	Eisenberg, Aviad Gannot, Sharon Chazan, Shlomo E.
contents	This paper introduces a multi-microphone method for extracting a desired speaker from a mixture involving multiple speakers and directional noise in a reverberant environment. In this work, we propose leveraging the instantaneous relative transfer function (RTF), estimated from a reference utterance recorded in the same position as the desired source. The effectiveness of the RTF-based spatial cue is compared with direction of arrival (DOA)-based spatial cue and the conventional spectral embedding. Experimental results in challenging acoustic scenarios demonstrate that using spatial cues yields better performance than the spectral-based cue and that the instantaneous RTF outperforms the DOA-based spatial cue.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_06285
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	End-to-End Multi-Microphone Speaker Extraction Using Relative Transfer Functions Eisenberg, Aviad Gannot, Sharon Chazan, Shlomo E. Sound Artificial Intelligence This paper introduces a multi-microphone method for extracting a desired speaker from a mixture involving multiple speakers and directional noise in a reverberant environment. In this work, we propose leveraging the instantaneous relative transfer function (RTF), estimated from a reference utterance recorded in the same position as the desired source. The effectiveness of the RTF-based spatial cue is compared with direction of arrival (DOA)-based spatial cue and the conventional spectral embedding. Experimental results in challenging acoustic scenarios demonstrate that using spatial cues yields better performance than the spectral-based cue and that the instantaneous RTF outperforms the DOA-based spatial cue.
title	End-to-End Multi-Microphone Speaker Extraction Using Relative Transfer Functions
topic	Sound Artificial Intelligence
url	https://arxiv.org/abs/2502.06285

Similar Items