:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Rozenfeld, Vadim, Goldshtein, Bracha Laufer
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2603.17377
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Conformal Prediction for Manifold-based Source Localization with Gaussian Processes
by: Rozenfeld, Vadim, et al.
Published: (2024)

Uncertainty Quantification in Machine Learning for Joint Speaker Diarization and Identification
by: McKnight, Simon W., et al.
Published: (2023)

Speaker Contrastive Learning for Source Speaker Tracing
by: Wang, Qing, et al.
Published: (2024)

Eliminating Quantization Errors in Classification-Based Sound Source Localization
by: Feng, Linfeng, et al.
Published: (2023)

MVANet: Multi-Stage Video Attention Network for Sound Event Localization and Detection with Source Distance Estimation
by: Hong, Hengyi, et al.
Published: (2024)

Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
by: Shao, Yiwen, et al.
Published: (2024)

The Database and Benchmark for the Source Speaker Tracing Challenge 2024
by: Li, Ze, et al.
Published: (2024)

3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and Diarization
by: Chen, Yafeng, et al.
Published: (2024)

Where's That Voice Coming? Continual Learning for Sound Source Localization
by: Xiao, Yang, et al.
Published: (2024)

Xi+: Uncertainty Supervision for Robust Speaker Embedding
by: Li, Junjie, et al.
Published: (2025)

Uncertainty Quantification in Melody Estimation using Histogram Representation
by: Saxena, Kavya Ranjan, et al.
Published: (2025)

Reverberation-Robust Localization of Speakers Using Distinct Speech Onsets and Multi-channel Cross-Correlations
by: Lin, Shoufeng
Published: (2026)

Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
by: Qian, Xinyuan, et al.
Published: (2024)

TF-Mamba: A Time-Frequency Network for Sound Source Localization
by: Xiao, Yang, et al.
Published: (2024)

Steered Response Power for Sound Source Localization: A Tutorial Review
by: Grinstein, Eric, et al.
Published: (2024)

Pretraining Multi-Speaker Identification for Neural Speaker Diarization
by: Horiguchi, Shota, et al.
Published: (2025)

Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)

MASSLOC: A Massive Sound Source Localization System based on Direction-of-Arrival Estimation
by: Fischer, Georg K. J., et al.
Published: (2025)

MC-LExt: Multi-Channel Target Speaker Extraction with Onset-Prompted Speaker Conditioning Mechanism
by: Ling, Tongtao, et al.
Published: (2025)

Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios
by: Fiorio, Luan Vinícius, et al.
Published: (2025)

CNN-based Robust Sound Source Localization with SRP-PHAT for the Extreme Edge
by: Yin, Jun, et al.
Published: (2025)

LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
by: Yemini, Yochai, et al.
Published: (2023)

Efficient Area-based and Speaker-Agnostic Source Separation
by: Strauss, Martin, et al.
Published: (2024)

Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization
by: Cheng, Ming, et al.
Published: (2024)

UltrasonicSpheres: Localized, Multi-Channel Sound Spheres Using Off-the-Shelf Speakers and Earables
by: Küttner, Michael, et al.
Published: (2025)

Leveraging Sound Source Trajectories for Universal Sound Separation
by: Wu, Donghang, et al.
Published: (2024)

IPDnet: A Universal Direct-Path IPD Estimation Network for Sound Source Localization
by: Wang, Yabo, et al.
Published: (2024)

A Steered Response Power Method for Sound Source Localization With Generic Acoustic Models
by: Müller, Kaspar, et al.
Published: (2025)

Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
by: Jeon, Yejin, et al.
Published: (2024)

Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR
by: Wang, Weiqing, et al.
Published: (2025)

Linearly Constrained Deep Beamformer for Multi-Speaker Scenarios
by: Zaidel, Ilai, et al.
Published: (2026)

Importance-Weighted Domain Adaptation for Sound Source Tracking
by: Zhong, Bingxiang, et al.
Published: (2025)

Multiple Speaker Separation from Noisy Sources in Reverberant Rooms using Relative Transfer Matrix
by: Manamperi, Wageesha N., et al.
Published: (2025)

Speaker Anonymisation for Speech-based Suicide Risk Detection
by: Cui, Ziyun, et al.
Published: (2025)

SRP-PHAT-NET: A Reliability-Driven DNN for Reverberant Speaker Localization
by: Shaybet, Bar, et al.
Published: (2025)

AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition
by: Dai, Yuhang, et al.
Published: (2025)

Can We Really Repurpose Multi-Speaker ASR Corpus for Speaker Diarization?
by: Horiguchi, Shota, et al.
Published: (2025)

Recursive Attentive Pooling for Extracting Speaker Embeddings from Multi-Speaker Recordings
by: Horiguchi, Shota, et al.
Published: (2024)

From Independence to Interaction: Speaker-Aware Simulation of Multi-Speaker Conversational Timing
by: Gedeon, Máté, et al.
Published: (2025)

Study on Inter and Intra Speaker Variability in Speaker Recognition
by: Okhotnikov, Anton, et al.
Published: (2024)