:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	de Oliveira, Danilo, Peer, Tal, Rochdi, Jonas, Gerkmann, Timo
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Sound
Online Access:	https://arxiv.org/abs/2510.21317
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models
by: Richter, Julius, et al.
Published: (2025)

Investigating Training Objectives for Generative Speech Enhancement
by: Richter, Julius, et al.
Published: (2024)

Do We Need EMA for Diffusion-Based Speech Enhancement? Toward a Magnitude-Preserving Network Architecture
by: Richter, Julius, et al.
Published: (2025)

Real-Time Streaming Mel Vocoding with Generative Flow Matching
by: Welker, Simon, et al.
Published: (2025)

Too Good to Be True: A Study on Modern Automatic Speech Recognition for the Evaluation of Speech Enhancement
by: de Oliveira, Danilo, et al.
Published: (2026)

The PESQetarian: On the Relevance of Goodhart's Law for Speech Enhancement
by: de Oliveira, Danilo, et al.
Published: (2024)

Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech
by: de Oliveira, Danilo, et al.
Published: (2024)

An Analysis of the Variance of Diffusion-based Speech Enhancement
by: Lay, Bunlong, et al.
Published: (2024)

Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters
by: Tesch, Kristina, et al.
Published: (2023)

Bone-conduction Guided Multimodal Speech Enhancement with Conditional Diffusion Models
by: Khanagha, Sina, et al.
Published: (2026)

Are Modern Speech Enhancement Systems Vulnerable to Adversarial Attacks?
by: Makarov, Rostislav, et al.
Published: (2025)

Speech Enhancement and Dereverberation with Diffusion-based Generative Models
by: Richter, Julius, et al.
Published: (2022)

Diffusion Buffer for Online Generative Speech Enhancement
by: Lay, Bunlong, et al.
Published: (2025)

ReverbFX: A Dataset of Room Impulse Responses Derived from Reverb Effect Plugins for Singing Voice Dereverberation
by: Richter, Julius, et al.
Published: (2025)

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
by: Lemercier, Jean-Marie, et al.
Published: (2022)

Steering Deep Non-Linear Spatially Selective Filters for Weakly Guided Extraction of Moving Speakers in Dynamic Scenarios
by: Kienegger, Jakob, et al.
Published: (2025)

Autoregressive Guidance of Deep Spatially Selective Filters using Bayesian Tracking for Efficient Extraction of Moving Speakers
by: Kienegger, Jakob, et al.
Published: (2026)

Adaptive Rotary Steering with Joint Autoregression for Robust Extraction of Closely Moving Speakers in Dynamic Scenarios
by: Kienegger, Jakob, et al.
Published: (2026)

EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data
by: Prabhu, Navin Raj, et al.
Published: (2023)

Single and Few-step Diffusion for Generative Speech Enhancement
by: Lay, Bunlong, et al.
Published: (2023)

Mask-Weighted Spatial Likelihood Coding for Speaker-Independent Joint Localization and Mask Estimation
by: Kienegger, Jakob, et al.
Published: (2024)

Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model
by: Lemercier, Jean-Marie, et al.
Published: (2023)

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation
by: Richter, Julius, et al.
Published: (2024)

Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance
by: Kienegger, Jakob, et al.
Published: (2025)

BUDDy: Single-Channel Blind Unsupervised Dereverberation with Diffusion Models
by: Moliner, Eloi, et al.
Published: (2024)

Unsupervised Blind Joint Dereverberation and Room Acoustics Estimation with Diffusion Models
by: Lemercier, Jean-Marie, et al.
Published: (2024)

The Voice Behind the Words: Quantifying Intersectional Bias in SpeechLLMs
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2026)

Gibberish is All You Need for Membership Inference Detection in Contrastive Language-Audio Pretraining
by: Cheng, Ruoxi, et al.
Published: (2024)

Enhancing In-the-Wild Speech Emotion Conversion with Resynthesis-based Duration Modeling
by: Prabhu, Navin Raj, et al.
Published: (2025)

Diffusion Models for Audio Restoration
by: Lemercier, Jean-Marie, et al.
Published: (2024)

HRTF Estimation using a Score-based Prior
by: Thuillier, Etienne, et al.
Published: (2024)

Rare Word Recognition and Translation Without Fine-Tuning via Task Vector in Speech Models
by: Jing, Ruihao, et al.
Published: (2025)

Can LLMs Help Localize Fake Words in Partially Fake Speech?
by: Zhang, Lin, et al.
Published: (2026)

Integrating Pause Information with Word Embeddings in Language Models for Alzheimer's Disease Detection from Spontaneous Speech
by: Pu, Yu, et al.
Published: (2025)

A Fast Solver for Interpolating Stochastic Differential Equation Diffusion Models for Speech Restoration
by: Lay, Bunlong, et al.
Published: (2026)

Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis
by: Wang, Tianrui, et al.
Published: (2025)

Quantifying Dimensional Independence in Speech: An Information-Theoretic Framework for Disentangled Representation Learning
by: Kashyap, Bipasha, et al.
Published: (2026)

Word Level Timestamp Generation for Automatic Speech Recognition and Translation
by: Hu, Ke, et al.
Published: (2025)

Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
by: Ku, Pin-Jui, et al.
Published: (2024)

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
by: Li, Jiaqi, et al.
Published: (2024)