Saved in:
| Main Authors: | Rahimi, Akam, Afouras, Triantafyllos, Zisserman, Andrew |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.01401 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
by: Rahimi, Akam, et al.
Published: (2025)
by: Rahimi, Akam, et al.
Published: (2025)
Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation
by: Lee, Jaejun, et al.
Published: (2025)
by: Lee, Jaejun, et al.
Published: (2025)
IDMap: A Pseudo-Speaker Generator Framework Based on Speaker Identity Index to Vector Mapping
by: Liu, Zeyan, et al.
Published: (2025)
by: Liu, Zeyan, et al.
Published: (2025)
Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
by: Aldeneh, Zakaria, et al.
Published: (2024)
by: Aldeneh, Zakaria, et al.
Published: (2024)
Single-Microphone Speaker Separation and Voice Activity Detection in Noisy and Reverberant Environments
by: Opochinsky, Renana, et al.
Published: (2024)
by: Opochinsky, Renana, et al.
Published: (2024)
Speaker-agnostic Emotion Vector for Cross-speaker Emotion Intensity Control
by: Murata, Masato, et al.
Published: (2025)
by: Murata, Masato, et al.
Published: (2025)
Inference Attacks for X-Vector Speaker Anonymization
by: Bauer, Luke, et al.
Published: (2025)
by: Bauer, Luke, et al.
Published: (2025)
Coherence-Based Frequency Subset Selection For Binaural RTF-Vector-Based Direction of Arrival Estimation for Multiple Speakers
by: Fejgin, Daniel, et al.
Published: (2022)
by: Fejgin, Daniel, et al.
Published: (2022)
Voice Conversion Augmentation for Speaker Recognition on Defective Datasets
by: Tao, Ruijie, et al.
Published: (2024)
by: Tao, Ruijie, et al.
Published: (2024)
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers
by: Park, Nohil, et al.
Published: (2024)
by: Park, Nohil, et al.
Published: (2024)
The VoxCeleb Speaker Recognition Challenge: A Retrospective
by: Huh, Jaesung, et al.
Published: (2024)
by: Huh, Jaesung, et al.
Published: (2024)
Neural Forward Filtering for Speaker-Image Separation
by: Sun, Jingqi, et al.
Published: (2025)
by: Sun, Jingqi, et al.
Published: (2025)
Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization
by: Thienpondt, Jenthe, et al.
Published: (2024)
by: Thienpondt, Jenthe, et al.
Published: (2024)
Generating Novel and Realistic Speakers for Voice Conversion
by: Chen, Meiying Melissa, et al.
Published: (2025)
by: Chen, Meiying Melissa, et al.
Published: (2025)
Exploiting an External Microphone for Binaural RTF-Vector-Based Direction of Arrival Estimation for Multiple Speakers
by: Fejgin, Daniel, et al.
Published: (2023)
by: Fejgin, Daniel, et al.
Published: (2023)
Language-Invariant Multilingual Speaker Verification for the TidyVoice 2026 Challenge
by: Li, Ze, et al.
Published: (2026)
by: Li, Ze, et al.
Published: (2026)
Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization
by: Tao, Ruijie, et al.
Published: (2024)
by: Tao, Ruijie, et al.
Published: (2024)
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection
by: Zeng, Bang, et al.
Published: (2025)
by: Zeng, Bang, et al.
Published: (2025)
Residual Speaker Representation for One-Shot Voice Conversion
by: Xu, Le, et al.
Published: (2023)
by: Xu, Le, et al.
Published: (2023)
Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization
by: Cheng, Ming, et al.
Published: (2024)
by: Cheng, Ming, et al.
Published: (2024)
On the Use of Self-Supervised Representation Learning for Speaker Diarization and Separation
by: Baroudi, Séverin, et al.
Published: (2025)
by: Baroudi, Séverin, et al.
Published: (2025)
TidyVoice: A Curated Multilingual Dataset for Speaker Verification Derived from Common Voice
by: Farhadipour, Aref, et al.
Published: (2026)
by: Farhadipour, Aref, et al.
Published: (2026)
Profile-Error-Tolerant Target-Speaker Voice Activity Detection
by: Wang, Dongmei, et al.
Published: (2023)
by: Wang, Dongmei, et al.
Published: (2023)
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
by: Prajwal, K R, et al.
Published: (2024)
by: Prajwal, K R, et al.
Published: (2024)
Adaptive Speaker Embedding Self-Augmentation for Personal Voice Activity Detection with Short Enrollment Speech
by: Feng, Fuyuan, et al.
Published: (2026)
by: Feng, Fuyuan, et al.
Published: (2026)
Efficient Area-based and Speaker-Agnostic Source Separation
by: Strauss, Martin, et al.
Published: (2024)
by: Strauss, Martin, et al.
Published: (2024)
Attacking Voice Anonymization Systems with Augmented Feature and Speaker Identity Difference
by: Zhang, Yanzhe, et al.
Published: (2024)
by: Zhang, Yanzhe, et al.
Published: (2024)
Phase Aware Ear-Conditioned Learning for Multi-Channel Binaural Speaker Separation
by: Jeremiah, Ruben Johnson Robert, et al.
Published: (2025)
by: Jeremiah, Ruben Johnson Robert, et al.
Published: (2025)
EEND-SAA: Enrollment-Less Main Speaker Voice Activity Detection Using Self-Attention Attractors
by: Wu, Wen-Yung, et al.
Published: (2025)
by: Wu, Wen-Yung, et al.
Published: (2025)
3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and Diarization
by: Chen, Yafeng, et al.
Published: (2024)
by: Chen, Yafeng, et al.
Published: (2024)
Sparse Direction of Arrival Estimation Method Based on Vector Signal Reconstruction with a Single Vector Sensor
by: Guo, Jiabin
Published: (2024)
by: Guo, Jiabin
Published: (2024)
Quantifying and Reducing Speaker Heterogeneity within the Common Voice Corpus for Phonetic Analysis
by: Zhang, Miao, et al.
Published: (2025)
by: Zhang, Miao, et al.
Published: (2025)
JoyVoice: Long-Context Conditioning for Anthropomorphic Multi-Speaker Conversational Synthesis
by: Yu, Fan, et al.
Published: (2025)
by: Yu, Fan, et al.
Published: (2025)
Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2023)
by: Wang, Zhichao, et al.
Published: (2023)
Moving Speaker Separation via Parallel Spectral-Spatial Processing
by: Wang, Yuzhu, et al.
Published: (2026)
by: Wang, Yuzhu, et al.
Published: (2026)
Reproducing the Acoustic Velocity Vectors in a Spherical Listening Region
by: Wang, Jiarui, et al.
Published: (2023)
by: Wang, Jiarui, et al.
Published: (2023)
Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models
by: Bereuter, Paul A., et al.
Published: (2025)
by: Bereuter, Paul A., et al.
Published: (2025)
Multiple Speaker Separation from Noisy Sources in Reverberant Rooms using Relative Transfer Matrix
by: Manamperi, Wageesha N., et al.
Published: (2025)
by: Manamperi, Wageesha N., et al.
Published: (2025)
Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers
by: Wang, Yuzhu, et al.
Published: (2025)
by: Wang, Yuzhu, et al.
Published: (2025)
Similar Items
-
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
by: Rahimi, Akam, et al.
Published: (2025) -
Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation
by: Lee, Jaejun, et al.
Published: (2025) -
IDMap: A Pseudo-Speaker Generator Framework Based on Speaker Identity Index to Vector Mapping
by: Liu, Zeyan, et al.
Published: (2025) -
Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
by: Aldeneh, Zakaria, et al.
Published: (2024) -
Single-Microphone Speaker Separation and Voice Activity Detection in Noisy and Reverberant Environments
by: Opochinsky, Renana, et al.
Published: (2024)