:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Yun, Liu, Xuechen, Miao, Xiaoxiao, Yamagishi, Junichi
Format:	Preprint
Published:	2026
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2603.04943
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Target Speaker Extraction with Curriculum Learning
by: Liu, Yun, et al.
Published: (2024)

Libri2Vox Dataset: Target Speaker Extraction with Diverse Speaker Conditions and Synthetic Data
by: Liu, Yun, et al.
Published: (2024)

Quantifying Source Speaker Leakage in One-to-One Voice Conversion
by: Wellington, Scott, et al.
Published: (2025)

Improving curriculum learning for target speaker extraction with synthetic speakers
by: Liu, Yun, et al.
Published: (2024)

Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
by: Chen, Zhengyang, et al.
Published: (2024)

Explaining Speaker and Spoof Embeddings via Probing
by: Liu, Xuechen, et al.
Published: (2024)

Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches
by: Zeng, Chang, et al.
Published: (2024)

Zero-Day Audio DeepFake Detection via Retrieval Augmentation and Profile Matching
by: Liu, Xuechen, et al.
Published: (2025)

Mitigating Language Mismatch in SSL-Based Speaker Anonymization
by: Zhang, Zhe, et al.
Published: (2025)

A Preliminary Case Study on Long-Form In-the-Wild Audio Spoofing Detection
by: Liu, Xuechen, et al.
Published: (2024)

LENS-DF: Deepfake Detection and Temporal Localization for Long-Form Noisy Speech
by: Liu, Xuechen, et al.
Published: (2025)

AfriHuBERT: A self-supervised speech representation model for African languages
by: Alabi, Jesujoba O., et al.
Published: (2024)

From Sharpness to Better Generalization for Speech Deepfake Detection
by: Huang, Wen, et al.
Published: (2025)

Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion
by: Chen, Zhengyang, et al.
Published: (2024)

Training-Free Multi-Step Inference for Target Speaker Extraction
by: You, Zhenghai, et al.
Published: (2026)

Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)

Generalizing Speaker Verification for Spoof Awareness in the Embedding Space
by: Liu, Xuechen, et al.
Published: (2024)

Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis
by: Wang, Xin, et al.
Published: (2024)

Robust Audio-Visual Target Speaker Extraction with Emotion-Aware Multiple Enrollment Fusion
by: Jin, Zhan, et al.
Published: (2025)

Brainprint-Modulated Target Speaker Extraction
by: Han, Qiushi, et al.
Published: (2025)

Enhancing Target Speaker Extraction with Explicit Speaker Consistency Modeling
by: Wu, Shu, et al.
Published: (2025)

Beyond Speaker Identity: Text Guided Target Speech Extraction
by: Huo, Mingyue, et al.
Published: (2025)

SecureSpeech: Prompt-based Speaker and Content Protection
by: Hui, Belinda Soh Hui, et al.
Published: (2025)

USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
by: Zeng, Bang, et al.
Published: (2024)

SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization
by: Tang, Beilong, et al.
Published: (2025)

Enhancing Intelligibility for Generative Target Speech Extraction via Joint Optimization with Target Speaker ASR
by: Ma, Hao, et al.
Published: (2025)

Improved Feature Extraction Network for Neuro-Oriented Target Speaker Extraction
by: Fan, Cunhang, et al.
Published: (2025)

ASVspoof 5: Evaluation of Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech
by: Wang, Xin, et al.
Published: (2026)

On the effectiveness of enrollment speech augmentation for Target Speaker Extraction
by: Li, Junjie, et al.
Published: (2024)

Listen to Extract: Onset-Prompted Target Speaker Extraction
by: Shen, Pengjie, et al.
Published: (2025)

Binaural Target Speaker Extraction using Individualized HRTF
by: Ellinson, Yoav, et al.
Published: (2025)

Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection
by: Zeng, Bang, et al.
Published: (2025)

Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device
by: Li, Zixuan, et al.
Published: (2025)

Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
by: Liu, Bei, et al.
Published: (2024)

The Third VoicePrivacy Challenge: Preserving Emotional Expressiveness and Linguistic Content in Voice Anonymization
by: Tomashenko, Natalia, et al.
Published: (2026)

Binaural Selective Attention Model for Target Speaker Extraction
by: Meng, Hanyu, et al.
Published: (2024)

FlowTSE: Target Speaker Extraction with Flow Matching
by: Navon, Aviv, et al.
Published: (2025)

Self Voice Conversion as an Attack against Neural Audio Watermarking
by: Özer, Yigitcan, et al.
Published: (2026)

M3ANet: Multi-scale and Multi-Modal Alignment Network for Brain-Assisted Target Speaker Extraction
by: Fan, Cunhang, et al.
Published: (2025)

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)