:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Shuai, Qian, Yanmin, Yu, Kai
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2512.18286
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
by: Liu, Bei, et al.
Published: (2024)

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
by: Wang, Shuai, et al.
Published: (2024)

Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
by: Huang, Wen, et al.
Published: (2024)

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
by: Chen, Zhengyang, et al.
Published: (2024)

Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization
by: Liu, Bei, et al.
Published: (2024)

USED: Universal Speaker Extraction and Diarization
by: Ao, Junyi, et al.
Published: (2023)

Text adaptation for speaker verification with speaker-text factorized embeddings
by: Yang, Yexin, et al.
Published: (2025)

WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
by: Wang, Shuai, et al.
Published: (2024)

Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
by: Chen, Zhengyang, et al.
Published: (2024)

M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions
by: Wang, Shuai, et al.
Published: (2024)

USE: A Unified Model for Universal Sound Separation and Extraction
by: Wang, Hongyu, et al.
Published: (2025)

Detect, Attend and Extract: Keyword Guided Target Speaker Extraction
by: Li, Haoyu, et al.
Published: (2026)

Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array
by: Qiao, Yue, et al.
Published: (2024)

Dual-LoRA: Parameter-Efficient Adversarial Disentanglement for Cross-Lingual Speaker Verification
by: Shangguan, Qituan, et al.
Published: (2026)

Emotional Styles Hide in Deep Speaker Embeddings: Disentangle Deep Speaker Embeddings for Speaker Clustering
by: Lin, Chaohao, et al.
Published: (2025)

Guided Speaker Embedding
by: Horiguchi, Shota, et al.
Published: (2024)

Time-Layer Adaptive Alignment for Speaker Similarity in Flow-Matching Based Zero-Shot TTS
by: Li, Haoyu, et al.
Published: (2025)

Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)

Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion
by: Chen, Zhengyang, et al.
Published: (2024)

Mitigating Non-Target Speaker Bias in Guided Speaker Embedding
by: Horiguchi, Shota, et al.
Published: (2025)

USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
by: Zeng, Bang, et al.
Published: (2024)

Unsupervised Single-Channel Speech Separation with a Diffusion Prior under Speaker-Embedding Guidance
by: Shi, Runwu, et al.
Published: (2025)

EvoTSE: Evolving Enrollment for Target Speaker Extraction
by: Liu, Zikai, et al.
Published: (2026)

Interpreting the Dimensions of Speaker Embedding Space
by: Huckvale, Mark
Published: (2025)

Recursive Attentive Pooling for Extracting Speaker Embeddings from Multi-Speaker Recordings
by: Horiguchi, Shota, et al.
Published: (2024)

Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization
by: Thienpondt, Jenthe, et al.
Published: (2024)

PhiNet: Speaker Verification with Phonetic Interpretability
by: Ma, Yi, et al.
Published: (2026)

Adaptive Speaker Embedding Self-Augmentation for Personal Voice Activity Detection with Short Enrollment Speech
by: Feng, Fuyuan, et al.
Published: (2026)

Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection
by: Zeng, Bang, et al.
Published: (2025)

Xi+: Uncertainty Supervision for Robust Speaker Embedding
by: Li, Junjie, et al.
Published: (2025)

The Reasonable Effectiveness of Speaker Embeddings for Violence Detection
by: Jain, Sarthak, et al.
Published: (2024)

Explaining Speaker and Spoof Embeddings via Probing
by: Liu, Xuechen, et al.
Published: (2024)

SEED: Speaker Embedding Enhancement Diffusion Model
by: Nam, KiHyun, et al.
Published: (2025)

Robust Pitch Estimation and Tracking for Speakers Based on Subband Encoding and the Generalized Labeled Multi-Bernoulli Filter
by: Lin, Shoufeng
Published: (2026)

SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
by: Li, Junjie, et al.
Published: (2023)

Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
by: Shao, Yiwen, et al.
Published: (2024)

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
by: Fu, Ruibo, et al.
Published: (2024)

3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and Diarization
by: Chen, Yafeng, et al.
Published: (2024)

From Who Said What to Who They Are: Modular Training-free Identity-Aware LLM Refinement of Speaker Diarization
by: Chen, Yu-Wen, et al.
Published: (2025)

Can Audio Large Language Models Verify Speaker Identity?
by: Ren, Yiming, et al.
Published: (2025)