:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	de Oliveira, Danilo, Peer, Tal, Gerkmann, Timo
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2605.12107
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Are These Even Words? Quantifying the Gibberishness of Generative Speech Models
by: de Oliveira, Danilo, et al.
Published: (2025)

LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models
by: Richter, Julius, et al.
Published: (2025)

Investigating Training Objectives for Generative Speech Enhancement
by: Richter, Julius, et al.
Published: (2024)

Do We Need EMA for Diffusion-Based Speech Enhancement? Toward a Magnitude-Preserving Network Architecture
by: Richter, Julius, et al.
Published: (2025)

The PESQetarian: On the Relevance of Goodhart's Law for Speech Enhancement
by: de Oliveira, Danilo, et al.
Published: (2024)

Are Modern Speech Enhancement Systems Vulnerable to Adversarial Attacks?
by: Makarov, Rostislav, et al.
Published: (2025)

An Analysis of the Variance of Diffusion-based Speech Enhancement
by: Lay, Bunlong, et al.
Published: (2024)

Enhancing In-the-Wild Speech Emotion Conversion with Resynthesis-based Duration Modeling
by: Prabhu, Navin Raj, et al.
Published: (2025)

Real-Time Streaming Mel Vocoding with Generative Flow Matching
by: Welker, Simon, et al.
Published: (2025)

A Fast Solver for Interpolating Stochastic Differential Equation Diffusion Models for Speech Restoration
by: Lay, Bunlong, et al.
Published: (2026)

Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech
by: de Oliveira, Danilo, et al.
Published: (2024)

Diffusion Buffer: Online Diffusion-based Speech Enhancement with Sub-Second Latency
by: Lay, Bunlong, et al.
Published: (2025)

Bone-conduction Guided Multimodal Speech Enhancement with Conditional Diffusion Models
by: Khanagha, Sina, et al.
Published: (2026)

Diffusion Buffer for Online Generative Speech Enhancement
by: Lay, Bunlong, et al.
Published: (2025)

Speech Enhancement and Dereverberation with Diffusion-based Generative Models
by: Richter, Julius, et al.
Published: (2022)

Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters
by: Tesch, Kristina, et al.
Published: (2023)

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
by: Lemercier, Jean-Marie, et al.
Published: (2022)

Single and Few-step Diffusion for Generative Speech Enhancement
by: Lay, Bunlong, et al.
Published: (2023)

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation
by: Richter, Julius, et al.
Published: (2024)

Robustness of Speech Separation Models for Similar-pitch Speakers
by: Lay, Bunlong, et al.
Published: (2024)

A Semi-spontaneous Dutch Speech Dataset for Speech Enhancement and Speech Recognition
by: de Groot, Dimme, et al.
Published: (2026)

Automatic Speech Recognition in the Modern Era: Architectures, Training, and Evaluation
by: Nayeem, Md., et al.
Published: (2025)

Fairness of Automatic Speech Recognition in Cleft Lip and Palate Speech
by: Bhattacharjee, Susmita, et al.
Published: (2025)

Latent-Level Enhancement with Flow Matching for Robust Automatic Speech Recognition
by: Yang, Da-Hee, et al.
Published: (2026)

Speaker Attributed Automatic Speech Recognition Using Speech Aware LLMS
by: Aronowitz, Hagai, et al.
Published: (2026)

EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data
by: Prabhu, Navin Raj, et al.
Published: (2023)

Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
by: Chen, Peikun, et al.
Published: (2024)

Robust Speech Recognition with Schrödinger Bridge-Based Speech Enhancement
by: Nasretdinov, Rauf, et al.
Published: (2025)

SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation
by: Du, Jiayu, et al.
Published: (2024)

The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge
by: Tian, Jingguang, et al.
Published: (2024)

An Analysis of Joint Nonlinear Spatial Filtering for Spatial Aliasing Reduction
by: Mannanova, Alina, et al.
Published: (2025)

Using Songs to Improve Kazakh Automatic Speech Recognition
by: Yeshpanov, Rustem
Published: (2026)

Unsupervised Online Continual Learning for Automatic Speech Recognition
by: Eeckt, Steven Vander, et al.
Published: (2024)

Non-Intrusive Automatic Speech Recognition Refinement: A Survey
by: Peyghan, Mohammad Reza, et al.
Published: (2025)

Objective and Subjective Evaluation of Diffusion-Based Speech Enhancement for Dysarthric Speech
by: de Groot, Dimme, et al.
Published: (2025)

Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora
by: Nespoli, Francesco, et al.
Published: (2024)

ReverbFX: A Dataset of Room Impulse Responses Derived from Reverb Effect Plugins for Singing Voice Dereverberation
by: Richter, Julius, et al.
Published: (2025)

A Large Dataset of Spontaneous Speech with the Accent Spoken in São Paulo for Automatic Speech Recognition Evaluation
by: Lima, Rodrigo, et al.
Published: (2024)

Disentangled-Transformer: An Explainable End-to-End Automatic Speech Recognition Model with Speech Content-Context Separation
by: Wang, Pu, et al.
Published: (2024)

Automatic Speech Recognition for Hindi
by: Saha, Anish, et al.
Published: (2024)