:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Rahimi, Akam, Afouras, Triantafyllos, Zisserman, Andrew
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2501.01401
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
by: Rahimi, Akam, et al.
Published: (2025)

Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation
by: Lee, Jaejun, et al.
Published: (2025)

IDMap: A Pseudo-Speaker Generator Framework Based on Speaker Identity Index to Vector Mapping
by: Liu, Zeyan, et al.
Published: (2025)

Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
by: Aldeneh, Zakaria, et al.
Published: (2024)

Single-Microphone Speaker Separation and Voice Activity Detection in Noisy and Reverberant Environments
by: Opochinsky, Renana, et al.
Published: (2024)

Speaker-agnostic Emotion Vector for Cross-speaker Emotion Intensity Control
by: Murata, Masato, et al.
Published: (2025)

Inference Attacks for X-Vector Speaker Anonymization
by: Bauer, Luke, et al.
Published: (2025)

Coherence-Based Frequency Subset Selection For Binaural RTF-Vector-Based Direction of Arrival Estimation for Multiple Speakers
by: Fejgin, Daniel, et al.
Published: (2022)

Voice Conversion Augmentation for Speaker Recognition on Defective Datasets
by: Tao, Ruijie, et al.
Published: (2024)

NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers
by: Park, Nohil, et al.
Published: (2024)

The VoxCeleb Speaker Recognition Challenge: A Retrospective
by: Huh, Jaesung, et al.
Published: (2024)

Neural Forward Filtering for Speaker-Image Separation
by: Sun, Jingqi, et al.
Published: (2025)

Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization
by: Thienpondt, Jenthe, et al.
Published: (2024)

Generating Novel and Realistic Speakers for Voice Conversion
by: Chen, Meiying Melissa, et al.
Published: (2025)

Exploiting an External Microphone for Binaural RTF-Vector-Based Direction of Arrival Estimation for Multiple Speakers
by: Fejgin, Daniel, et al.
Published: (2023)

Language-Invariant Multilingual Speaker Verification for the TidyVoice 2026 Challenge
by: Li, Ze, et al.
Published: (2026)

Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization
by: Tao, Ruijie, et al.
Published: (2024)

Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection
by: Zeng, Bang, et al.
Published: (2025)

Residual Speaker Representation for One-Shot Voice Conversion
by: Xu, Le, et al.
Published: (2023)

Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization
by: Cheng, Ming, et al.
Published: (2024)

On the Use of Self-Supervised Representation Learning for Speaker Diarization and Separation
by: Baroudi, Séverin, et al.
Published: (2025)

TidyVoice: A Curated Multilingual Dataset for Speaker Verification Derived from Common Voice
by: Farhadipour, Aref, et al.
Published: (2026)

Profile-Error-Tolerant Target-Speaker Voice Activity Detection
by: Wang, Dongmei, et al.
Published: (2023)

MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
by: Prajwal, K R, et al.
Published: (2024)

Adaptive Speaker Embedding Self-Augmentation for Personal Voice Activity Detection with Short Enrollment Speech
by: Feng, Fuyuan, et al.
Published: (2026)

Efficient Area-based and Speaker-Agnostic Source Separation
by: Strauss, Martin, et al.
Published: (2024)

Attacking Voice Anonymization Systems with Augmented Feature and Speaker Identity Difference
by: Zhang, Yanzhe, et al.
Published: (2024)

Phase Aware Ear-Conditioned Learning for Multi-Channel Binaural Speaker Separation
by: Jeremiah, Ruben Johnson Robert, et al.
Published: (2025)

EEND-SAA: Enrollment-Less Main Speaker Voice Activity Detection Using Self-Attention Attractors
by: Wu, Wen-Yung, et al.
Published: (2025)

3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and Diarization
by: Chen, Yafeng, et al.
Published: (2024)

Sparse Direction of Arrival Estimation Method Based on Vector Signal Reconstruction with a Single Vector Sensor
by: Guo, Jiabin
Published: (2024)

Quantifying and Reducing Speaker Heterogeneity within the Common Voice Corpus for Phonetic Analysis
by: Zhang, Miao, et al.
Published: (2025)

JoyVoice: Long-Context Conditioning for Anthropomorphic Multi-Speaker Conversational Synthesis
by: Yu, Fan, et al.
Published: (2025)

Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2023)

Moving Speaker Separation via Parallel Spectral-Spatial Processing
by: Wang, Yuzhu, et al.
Published: (2026)

Reproducing the Acoustic Velocity Vectors in a Spherical Listening Region
by: Wang, Jiarui, et al.
Published: (2023)

Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models
by: Bereuter, Paul A., et al.
Published: (2025)

Multiple Speaker Separation from Noisy Sources in Reverberant Rooms using Relative Transfer Matrix
by: Manamperi, Wageesha N., et al.
Published: (2025)

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
by: Chen, Zhengyang, et al.
Published: (2024)

Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers
by: Wang, Yuzhu, et al.
Published: (2025)