:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Haoyang, Zhuang, Xuyi, Adnan, Azmat, Ni, Ye, Rao, Wei, Gopal, Shreyas, Chng, Eng Siong
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2512.20978
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

EvoTSE: Evolving Enrollment for Target Speaker Extraction
by: Liu, Zikai, et al.
Published: (2026)

Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification
by: Truong, Duc-Tuan, et al.
Published: (2023)

Stream-Voice-Anon: Enhancing Utility of Real-Time Speaker Anonymization via Neural Audio Codec and Language Models
by: Kuzmin, Nikita, et al.
Published: (2026)

EASY: Emotion-aware Speaker Anonymization via Factorized Distillation
by: Yao, Jixun, et al.
Published: (2025)

USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
by: Zeng, Bang, et al.
Published: (2024)

Analysis of Speaker Verification Performance Trade-offs with Neural Audio Codec Transmission
by: Thakur, Nirmalya Mallick, et al.
Published: (2025)

Bi-directional Context-Enhanced Speech Large Language Models for Multilingual Conversational ASR
by: Peng, Yizhou, et al.
Published: (2025)

Improving Code-Switching Speech Recognition with TTS Data Augmentation
by: Yeo, Yue Heng, et al.
Published: (2026)

FlowTSE: Target Speaker Extraction with Flow Matching
by: Navon, Aviv, et al.
Published: (2025)

Aligning Generative Speech Enhancement with Perceptual Feedback
by: Li, Haoyang, et al.
Published: (2025)

StreamVoiceAnon+: Emotion-Preserving Streaming Speaker Anonymization via Frame-Level Acoustic Distillation
by: Kuzmin, Nikita, et al.
Published: (2026)

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
by: Yao, Jixun, et al.
Published: (2025)

MeanFlow-TSE: One-Step Generative Target Speaker Extraction with Mean Flow
by: Shimizu, Riki, et al.
Published: (2025)

Aligning Speech to Languages to Enhance Code-switching Speech Recognition
by: Liu, Hexin, et al.
Published: (2024)

Training-Free Intelligibility-Guided Observation Addition for Noisy ASR
by: Li, Haoyang, et al.
Published: (2026)

pTSE-T: Presentation Target Speaker Extraction using Unaligned Text Cues
by: Jiang, Ziyang, et al.
Published: (2024)

Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization
by: Tao, Ruijie, et al.
Published: (2024)

Zero-shot Context Biasing with Trie-based Decoding using Synthetic Multi-Pronunciation
by: Liu, Changsong, et al.
Published: (2025)

LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation
by: Luong, Hieu-Thi, et al.
Published: (2024)

Speech Enhancement Using Continuous Embeddings of Neural Audio Codec
by: Li, Haoyang, et al.
Published: (2025)

Code-switching Speech Recognition Under the Lens: Model- and Data-Centric Perspectives
by: Liu, Hexin, et al.
Published: (2025)

Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning
by: He, Haorui, et al.
Published: (2024)

3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications
by: He, Shulin, et al.
Published: (2023)

Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs
by: Yuhang, Yang, et al.
Published: (2024)

Speech Separation using Neural Audio Codecs with Embedding Loss
by: Yip, Jia Qi, et al.
Published: (2024)

Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech
by: Li, Yuxin, et al.
Published: (2025)

$C^2$AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction
by: Wu, Wenxuan, et al.
Published: (2025)

UniArray: Unified Spectral-Spatial Modeling for Array-Geometry-Agnostic Speech Separation
by: Chen, Weiguang, et al.
Published: (2025)

Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
by: Hu, Yuchen, et al.
Published: (2023)

Continual Learning Optimizations for Auto-regressive Decoder of Multilingual ASR systems
by: Kwok, Chin Yuen, et al.
Published: (2024)

From KAN to GR-KAN: Advancing Speech Enhancement with KAN-Based Methodology
by: Li, Haoyang, et al.
Published: (2024)

Room Impulse Responses help attackers to evade Deep Fake Detection
by: Luong, Hieu-Thi, et al.
Published: (2024)

Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
by: Ma, Ziyang, et al.
Published: (2025)

TSE-PI: Target Sound Extraction under Reverberant Environments with Pitch Information
by: Wang, Yiwen, et al.
Published: (2024)

Enhancing Target Speaker Extraction with Explicit Speaker Consistency Modeling
by: Wu, Shu, et al.
Published: (2025)

Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
by: Chen, Chen, et al.
Published: (2024)

Speechless: Speech Instruction Training Without Speech for Low Resource Languages
by: Dao, Alan, et al.
Published: (2025)

Robust Localization of Partially Fake Speech: Metrics and Out-of-Domain Evaluation
by: Luong, Hieu-Thi, et al.
Published: (2025)

NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025
by: Peng, Yizhou, et al.
Published: (2025)

FD-Bench: A Full-Duplex Benchmarking Pipeline Designed for Full Duplex Spoken Dialogue Systems
by: Peng, Yizhou, et al.
Published: (2025)