:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Junjie, Lee, Kong Aik
Format:	Preprint
Published:	2026
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2601.15719
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Cosine Scoring with Uncertainty for Neural Speaker Embedding
by: Wang, Qiongqiong, et al.
Published: (2024)

Xi+: Uncertainty Supervision for Robust Speaker Embedding
by: Li, Junjie, et al.
Published: (2025)

On the effectiveness of enrollment speech augmentation for Target Speaker Extraction
by: Li, Junjie, et al.
Published: (2024)

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
by: Wang, Shuai, et al.
Published: (2024)

MeMo: Attentional Momentum for Real-time Audio-visual Speaker Extraction under Impaired Visual Conditions
by: Li, Junjie, et al.
Published: (2025)

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification
by: Liu, Tianchi, et al.
Published: (2023)

Any-to-any Speaker Attribute Perturbation for Asynchronous Voice Anonymization
by: Chen, Liping, et al.
Published: (2025)

Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification
by: Truong, Duc-Tuan, et al.
Published: (2023)

Generalizing Speaker Verification for Spoof Awareness in the Embedding Space
by: Liu, Xuechen, et al.
Published: (2024)

Text-dependent Speaker Verification (TdSV) Challenge 2024: Challenge Evaluation Plan
by: Hossein, Zeinali, et al.
Published: (2024)

MoMuSE: Momentum Multi-modal Target Speaker Extraction for Real-time Scenarios with Impaired Visual Cues
by: Li, Junjie, et al.
Published: (2024)

Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis
by: Wang, Xin, et al.
Published: (2024)

On the Generation and Removal of Speaker Adversarial Perturbation for Voice-Privacy Protection
by: Guo, Chenyang, et al.
Published: (2024)

VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
by: Lin, Weiwei, et al.
Published: (2024)

QAMO: Quality-aware Multi-centroid One-class Learning For Speech Deepfake Detection
by: Truong, Duc-Tuan, et al.
Published: (2025)

Addressing Gradient Misalignment in Data-Augmented Training for Robust Speech Deepfake Detection
by: Truong, Duc-Tuan, et al.
Published: (2025)

Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding
by: Zhang, Xin, et al.
Published: (2025)

Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio
by: Ma, Yi, et al.
Published: (2024)

A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition
by: Zhou, Zhenyu, et al.
Published: (2024)

The First Voice Timbre Attribute Detection Challenge
by: Chen, Liping, et al.
Published: (2025)

Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)

LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation
by: Luong, Hieu-Thi, et al.
Published: (2024)

Room Impulse Responses help attackers to evade Deep Fake Detection
by: Luong, Hieu-Thi, et al.
Published: (2024)

ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation
by: Gong, Junmin, et al.
Published: (2026)

Joint Learning Global-Local Speaker Classification to Enhance End-to-End Speaker Diarization and Recognition
by: Dai, Yuhang, et al.
Published: (2026)

Robust Localization of Partially Fake Speech: Metrics and Out-of-Domain Evaluation
by: Luong, Hieu-Thi, et al.
Published: (2025)

Adversarial speech for voice privacy protection from Personalized Speech generation
by: Chen, Shihao, et al.
Published: (2024)

Nes2Net: A Lightweight Nested Architecture for Foundation Model Driven Speech Anti-spoofing
by: Liu, Tianchi, et al.
Published: (2025)

SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models
by: Yin, Han, et al.
Published: (2025)

Emotion Recognition in Multi-Speaker Conversations through Speaker Identification, Knowledge Distillation, and Hierarchical Fusion
by: Li, Xiao, et al.
Published: (2025)

The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan
by: Sheng, Zhengyan, et al.
Published: (2025)

Introducing voice timbre attribute detection
by: He, Jinghao, et al.
Published: (2025)

Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation
by: Wang, Shiyao, et al.
Published: (2024)

Speaker Recognition -- Wavelet Packet Based Multiresolution Feature Extraction Approach
by: Bhardwaj, Saurabh, et al.
Published: (2025)

ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
by: Kong, Jungil, et al.
Published: (2023)

Multi-Target Backdoor Attacks Against Speaker Recognition
by: Fortier, Alexandrine, et al.
Published: (2025)

Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
by: Tao, Ruijie, et al.
Published: (2024)

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)

Breaking Speaker Recognition with PaddingBack
by: Ye, Zhe, et al.
Published: (2023)

SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker Recognition
by: Wang, Tianhao, et al.
Published: (2024)