Saved in:
| Main Authors: | Wang, Hongyu, Li, Chenda, Zhou, Xin, Wang, Shuai, Qian, Yanmin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.21215 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
by: Li, Chenda, et al.
Published: (2024)
by: Li, Chenda, et al.
Published: (2024)
MeanSE: Efficient Generative Speech Enhancement with Mean Flows
by: Wang, Jiahe, et al.
Published: (2025)
by: Wang, Jiahe, et al.
Published: (2025)
Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling
by: Zhang, Leying, et al.
Published: (2024)
by: Zhang, Leying, et al.
Published: (2024)
USED: Universal Speaker Extraction and Diarization
by: Ao, Junyi, et al.
Published: (2023)
by: Ao, Junyi, et al.
Published: (2023)
Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment
by: Wang, Wei, et al.
Published: (2025)
by: Wang, Wei, et al.
Published: (2025)
What Does the Speaker Embedding Encode?
by: Wang, Shuai, et al.
Published: (2025)
by: Wang, Shuai, et al.
Published: (2025)
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
by: Zhang, Wangyou, et al.
Published: (2024)
by: Zhang, Wangyou, et al.
Published: (2024)
Self-Guided Target Sound Extraction and Classification Through Universal Sound Separation Model and Multiple Clues
by: Kwon, Younghoo, et al.
Published: (2025)
by: Kwon, Younghoo, et al.
Published: (2025)
Representation-Regularized Convolutional Audio Transformer for Audio Understanding
by: Han, Bing, et al.
Published: (2026)
by: Han, Bing, et al.
Published: (2026)
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
by: Wang, Shuai, et al.
Published: (2024)
by: Wang, Shuai, et al.
Published: (2024)
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
by: Wang, Shuai, et al.
Published: (2024)
by: Wang, Shuai, et al.
Published: (2024)
URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
by: Zhang, Wangyou, et al.
Published: (2024)
by: Zhang, Wangyou, et al.
Published: (2024)
Efficient Multilingual ASR Finetuning via LoRA Language Experts
by: Li, Jiahong, et al.
Published: (2025)
by: Li, Jiahong, et al.
Published: (2025)
P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge
by: Sach, Marvin, et al.
Published: (2025)
by: Sach, Marvin, et al.
Published: (2025)
DDTSE: Discriminative Diffusion Model for Target Speech Extraction
by: Zhang, Leying, et al.
Published: (2023)
by: Zhang, Leying, et al.
Published: (2023)
Text adaptation for speaker verification with speaker-text factorized embeddings
by: Yang, Yexin, et al.
Published: (2025)
by: Yang, Yexin, et al.
Published: (2025)
Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
by: Huang, Wen, et al.
Published: (2024)
by: Huang, Wen, et al.
Published: (2024)
Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
ICASSP 2026 URGENT Speech Enhancement Challenge
by: Li, Chenda, et al.
Published: (2026)
by: Li, Chenda, et al.
Published: (2026)
Leveraging Sound Source Trajectories for Universal Sound Separation
by: Wu, Donghang, et al.
Published: (2024)
by: Wu, Donghang, et al.
Published: (2024)
URGENT-PK: Perceptually-Aligned Ranking Model Designed for Speech Enhancement Competition
by: Wang, Jiahe, et al.
Published: (2025)
by: Wang, Jiahe, et al.
Published: (2025)
Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization
by: Liu, Bei, et al.
Published: (2024)
by: Liu, Bei, et al.
Published: (2024)
LuSeeL: Language-queried Binaural Universal Sound Event Extraction and Localization
by: Pan, Zexu, et al.
Published: (2026)
by: Pan, Zexu, et al.
Published: (2026)
Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection
by: Han, Bing, et al.
Published: (2025)
by: Han, Bing, et al.
Published: (2025)
From Sharpness to Better Generalization for Speech Deepfake Detection
by: Huang, Wen, et al.
Published: (2025)
by: Huang, Wen, et al.
Published: (2025)
Toward Universal Speech Enhancement for Diverse Input Conditions
by: Zhang, Wangyou, et al.
Published: (2023)
by: Zhang, Wangyou, et al.
Published: (2023)
A Knowledge-Driven Approach to Target Speech Extraction in the Presence of Background Sound Effects for Cinematic Audio Source Separation (CASS)
by: Ho, Chun-wei, et al.
Published: (2026)
by: Ho, Chun-wei, et al.
Published: (2026)
Less is More: Data Curation Matters in Scaling Speech Enhancement
by: Li, Chenda, et al.
Published: (2025)
by: Li, Chenda, et al.
Published: (2025)
Discrete Token Modeling for Multi-Stem Music Source Separation with Language Models
by: Lyu, Pengbo, et al.
Published: (2026)
by: Lyu, Pengbo, et al.
Published: (2026)
Lessons Learned from the URGENT 2024 Speech Enhancement Challenge
by: Zhang, Wangyou, et al.
Published: (2025)
by: Zhang, Wangyou, et al.
Published: (2025)
EvoTSE: Evolving Enrollment for Target Speaker Extraction
by: Liu, Zikai, et al.
Published: (2026)
by: Liu, Zikai, et al.
Published: (2026)
SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods
by: Huang, Wen, et al.
Published: (2025)
by: Huang, Wen, et al.
Published: (2025)
Cross-attention Inspired Selective State Space Models for Target Sound Extraction
by: Wu, Donghang, et al.
Published: (2024)
by: Wu, Donghang, et al.
Published: (2024)
Sound Separation and Classification with Object and Semantic Guidance
by: Kwon, Younghoo, et al.
Published: (2025)
by: Kwon, Younghoo, et al.
Published: (2025)
DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio Classification
by: Lee, Dongheon, et al.
Published: (2024)
by: Lee, Dongheon, et al.
Published: (2024)
Universal Sound Separation with Self-Supervised Audio Masked Autoencoder
by: Zhao, Junqi, et al.
Published: (2024)
by: Zhao, Junqi, et al.
Published: (2024)
Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)
by: Zhang, Ke, et al.
Published: (2024)
AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection
by: Jiang, Anbai, et al.
Published: (2024)
by: Jiang, Anbai, et al.
Published: (2024)
Generalizable Audio Deepfake Detection via Latent Space Refinement and Augmentation
by: Huang, Wen, et al.
Published: (2025)
by: Huang, Wen, et al.
Published: (2025)
Similar Items
-
Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
by: Li, Chenda, et al.
Published: (2024) -
MeanSE: Efficient Generative Speech Enhancement with Mean Flows
by: Wang, Jiahe, et al.
Published: (2025) -
Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling
by: Zhang, Leying, et al.
Published: (2024) -
USED: Universal Speaker Extraction and Diarization
by: Ao, Junyi, et al.
Published: (2023) -
Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment
by: Wang, Wei, et al.
Published: (2025)