Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.06285 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866913685092433920 |
|---|---|
| author | Eisenberg, Aviad Gannot, Sharon Chazan, Shlomo E. |
| author_facet | Eisenberg, Aviad Gannot, Sharon Chazan, Shlomo E. |
| contents | This paper introduces a multi-microphone method for extracting a desired speaker from a mixture involving multiple speakers and directional noise in a reverberant environment. In this work, we propose leveraging the instantaneous relative transfer function (RTF), estimated from a reference utterance recorded in the same position as the desired source. The effectiveness of the RTF-based spatial cue is compared with direction of arrival (DOA)-based spatial cue and the conventional spectral embedding. Experimental results in challenging acoustic scenarios demonstrate that using spatial cues yields better performance than the spectral-based cue and that the instantaneous RTF outperforms the DOA-based spatial cue. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2502_06285 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | End-to-End Multi-Microphone Speaker Extraction Using Relative Transfer Functions Eisenberg, Aviad Gannot, Sharon Chazan, Shlomo E. Sound Artificial Intelligence This paper introduces a multi-microphone method for extracting a desired speaker from a mixture involving multiple speakers and directional noise in a reverberant environment. In this work, we propose leveraging the instantaneous relative transfer function (RTF), estimated from a reference utterance recorded in the same position as the desired source. The effectiveness of the RTF-based spatial cue is compared with direction of arrival (DOA)-based spatial cue and the conventional spectral embedding. Experimental results in challenging acoustic scenarios demonstrate that using spatial cues yields better performance than the spectral-based cue and that the instantaneous RTF outperforms the DOA-based spatial cue. |
| title | End-to-End Multi-Microphone Speaker Extraction Using Relative Transfer Functions |
| topic | Sound Artificial Intelligence |
| url | https://arxiv.org/abs/2502.06285 |