:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Wei, Zhang, Wangyou, Li, Chenda, Wang, Jiahe, Cornell, Samuele, Sach, Marvin, Saijo, Kohei, Fu, Yihui, Ni, Zhaoheng, Han, Bing, Gong, Xun, Bi, Mengxiao, Fingscheidt, Tim, Watanabe, Shinji, Qian, Yanmin
Format:	Preprint
Published:	2026
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2601.18438
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ICASSP 2026 URGENT Speech Enhancement Challenge
by: Li, Chenda, et al.
Published: (2026)

URGENT-PK: Perceptually-Aligned Ranking Model Designed for Speech Enhancement Competition
by: Wang, Jiahe, et al.
Published: (2025)

Less is More: Data Curation Matters in Scaling Speech Enhancement
by: Li, Chenda, et al.
Published: (2025)

Lessons Learned from the URGENT 2024 Speech Enhancement Challenge
by: Zhang, Wangyou, et al.
Published: (2025)

Interspeech 2025 URGENT Speech Enhancement Challenge
by: Saijo, Kohei, et al.
Published: (2025)

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
by: Zhang, Wangyou, et al.
Published: (2024)

P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge
by: Sach, Marvin, et al.
Published: (2025)

Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
by: Zhang, Wangyou, et al.
Published: (2024)

Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
by: Li, Chenda, et al.
Published: (2024)

Toward Universal Speech Enhancement for Diverse Input Conditions
by: Zhang, Wangyou, et al.
Published: (2023)

Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment
by: Wang, Wei, et al.
Published: (2025)

Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling
by: Zhang, Leying, et al.
Published: (2024)

MeanSE: Efficient Generative Speech Enhancement with Mean Flows
by: Wang, Jiahe, et al.
Published: (2025)

MAPSS: Manifold-based Assessment of Perceptual Source Separation
by: Ivry, Amir, et al.
Published: (2025)

Improving Design of Input Condition Invariant Speech Enhancement
by: Zhang, Wangyou, et al.
Published: (2024)

PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning
by: Shi, Jiatong, et al.
Published: (2025)

Non-Causal to Causal SSL-Supported Transfer Learning: Towards a High-Performance Low-Latency Speech Vocoder
by: Shi, Renzheng, et al.
Published: (2024)

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
by: Cornell, Samuele, et al.
Published: (2024)

Who Spoke What When? Evaluating Spoken Language Models for Conversational ASR with Semantic and Overlap-Aware Metrics
by: Tawara, Naohiro, et al.
Published: (2026)

Representation-Regularized Convolutional Audio Transformer for Audio Understanding
by: Han, Bing, et al.
Published: (2026)

SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
by: Wu, Yihan, et al.
Published: (2024)

DisContSE: Single-Step Diffusion Speech Enhancement Based on Joint Discrete and Continuous Embeddings
by: Fu, Yihui, et al.
Published: (2026)

Mind the Gap: Impact of Synthetic Conversational Data on Multi-Talker ASR and Speaker Diarization
by: Polok, Alexander, et al.
Published: (2026)

Cross-Talk Speech Reduction, by Separation, for Separation
by: Wang, Zhong-Qiu, et al.
Published: (2026)

ARECHO: Autoregressive Evaluation via Chain-Based Hypothesis Optimization for Speech Multi-Metric Estimation
by: Shi, Jiatong, et al.
Published: (2025)

A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
by: Saijo, Kohei, et al.
Published: (2025)

Input-Adaptive Spectral Feature Compression by Sequence Modeling for Source Separation
by: Saijo, Kohei, et al.
Published: (2026)

Is MixIT Really Unsuitable for Correlated Sources? Exploring MixIT for Unsupervised Pre-training in Music Source Separation
by: Saijo, Kohei, et al.
Published: (2025)

USE: A Unified Model for Universal Sound Separation and Extraction
by: Wang, Hongyu, et al.
Published: (2025)

Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction
by: Zhang, Leying, et al.
Published: (2025)

DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice
by: Zhang, Leying, et al.
Published: (2026)

Ring Mixing with Auxiliary Signal-to-Consistency-Error Ratio Loss for Unsupervised Denoising in Speech Separation
by: Maciejewski, Matthew, et al.
Published: (2026)

Exploiting Noise Inseparability for Weakly-Supervised Discriminative Speech Denoising Using Noisy Targets
by: Maciejewski, Matthew, et al.
Published: (2026)

OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder
by: Bharadwaj, Shikhar, et al.
Published: (2025)

The CMU-AIST submission for the ICME 2025 Audio Encoder Challenge
by: Bharadwaj, Shikhar, et al.
Published: (2026)

The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
by: Cornell, Samuele, et al.
Published: (2024)

On the Distillation Loss Functions of Speech VAE for Unified Reconstruction, Understanding, and Generation
by: Cheng, Changhao, et al.
Published: (2026)

SLM-SS: Speech Language Model for Generative Speech Separation
by: Li, Tianhua, et al.
Published: (2026)

ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration
by: Someki, Masao, et al.
Published: (2024)

Preferences in AI algorithms: The need for relevant risk attitudes in automated decisions under uncertainties
by: Elisabeth Paté‐Cornell
Published: (2024)