:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Hongyu, Li, Chenda, Zhou, Xin, Wang, Shuai, Qian, Yanmin
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2512.21215
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
by: Li, Chenda, et al.
Published: (2024)

MeanSE: Efficient Generative Speech Enhancement with Mean Flows
by: Wang, Jiahe, et al.
Published: (2025)

Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling
by: Zhang, Leying, et al.
Published: (2024)

USED: Universal Speaker Extraction and Diarization
by: Ao, Junyi, et al.
Published: (2023)

Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment
by: Wang, Wei, et al.
Published: (2025)

What Does the Speaker Embedding Encode?
by: Wang, Shuai, et al.
Published: (2025)

Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
by: Zhang, Wangyou, et al.
Published: (2024)

Self-Guided Target Sound Extraction and Classification Through Universal Sound Separation Model and Multiple Clues
by: Kwon, Younghoo, et al.
Published: (2025)

Representation-Regularized Convolutional Audio Transformer for Audio Understanding
by: Han, Bing, et al.
Published: (2026)

WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
by: Wang, Shuai, et al.
Published: (2024)

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
by: Wang, Shuai, et al.
Published: (2024)

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
by: Zhang, Wangyou, et al.
Published: (2024)

Efficient Multilingual ASR Finetuning via LoRA Language Experts
by: Li, Jiahong, et al.
Published: (2025)

P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge
by: Sach, Marvin, et al.
Published: (2025)

DDTSE: Discriminative Diffusion Model for Target Speech Extraction
by: Zhang, Leying, et al.
Published: (2023)

Text adaptation for speaker verification with speaker-text factorized embeddings
by: Yang, Yexin, et al.
Published: (2025)

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
by: Chen, Zhengyang, et al.
Published: (2024)

Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
by: Huang, Wen, et al.
Published: (2024)

Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion
by: Chen, Zhengyang, et al.
Published: (2024)

ICASSP 2026 URGENT Speech Enhancement Challenge
by: Li, Chenda, et al.
Published: (2026)

Leveraging Sound Source Trajectories for Universal Sound Separation
by: Wu, Donghang, et al.
Published: (2024)

URGENT-PK: Perceptually-Aligned Ranking Model Designed for Speech Enhancement Competition
by: Wang, Jiahe, et al.
Published: (2025)

Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization
by: Liu, Bei, et al.
Published: (2024)

LuSeeL: Language-queried Binaural Universal Sound Event Extraction and Localization
by: Pan, Zexu, et al.
Published: (2026)

Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection
by: Han, Bing, et al.
Published: (2025)

From Sharpness to Better Generalization for Speech Deepfake Detection
by: Huang, Wen, et al.
Published: (2025)

Toward Universal Speech Enhancement for Diverse Input Conditions
by: Zhang, Wangyou, et al.
Published: (2023)

A Knowledge-Driven Approach to Target Speech Extraction in the Presence of Background Sound Effects for Cinematic Audio Source Separation (CASS)
by: Ho, Chun-wei, et al.
Published: (2026)

Less is More: Data Curation Matters in Scaling Speech Enhancement
by: Li, Chenda, et al.
Published: (2025)

Discrete Token Modeling for Multi-Stem Music Source Separation with Language Models
by: Lyu, Pengbo, et al.
Published: (2026)

Lessons Learned from the URGENT 2024 Speech Enhancement Challenge
by: Zhang, Wangyou, et al.
Published: (2025)

EvoTSE: Evolving Enrollment for Target Speaker Extraction
by: Liu, Zikai, et al.
Published: (2026)

SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods
by: Huang, Wen, et al.
Published: (2025)

Cross-attention Inspired Selective State Space Models for Target Sound Extraction
by: Wu, Donghang, et al.
Published: (2024)

Sound Separation and Classification with Object and Semantic Guidance
by: Kwon, Younghoo, et al.
Published: (2025)

DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio Classification
by: Lee, Dongheon, et al.
Published: (2024)

Universal Sound Separation with Self-Supervised Audio Masked Autoencoder
by: Zhao, Junqi, et al.
Published: (2024)

Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)

AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection
by: Jiang, Anbai, et al.
Published: (2024)

Generalizable Audio Deepfake Detection via Latent Space Refinement and Augmentation
by: Huang, Wen, et al.
Published: (2025)