:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Richter, Julius, Masuyama, Yoshiki, Boeddeker, Christoph, Edo, Takahiro, Wichern, Gordon, Roux, Jonathan Le
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing Machine Learning
Online Access:	https://arxiv.org/abs/2605.06189
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Exploring Disentangled Neural Speech Codecs from Self-Supervised Representations
by: Aihara, Ryo, et al.
Published: (2025)

FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement
by: Masuyama, Yoshiki, et al.
Published: (2025)

Direction-Aware Neural Acoustic Fields for Few-Shot Interpolation of Ambisonic Impulse Responses
by: Ick, Christopher, et al.
Published: (2025)

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings
by: Boeddeker, Christoph, et al.
Published: (2023)

Data Augmentation Using Neural Acoustic Fields With Retrieval-Augmented Pre-training
by: Ick, Christopher, et al.
Published: (2025)

Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization
by: Masuyama, Yoshiki, et al.
Published: (2025)

Physics-Informed Direction-Aware Neural Acoustic Fields
by: Masuyama, Yoshiki, et al.
Published: (2025)

Velocity Potential Neural Field for Efficient Ambisonics Impulse Response Modeling
by: Masuyama, Yoshiki, et al.
Published: (2026)

FasTUSS: Faster Task-Aware Unified Source Separation
by: Paissan, Francesco, et al.
Published: (2025)

TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement
by: Saijo, Kohei, et al.
Published: (2024)

SUNAC: Source-aware Unified Neural Audio Codec
by: Aihara, Ryo, et al.
Published: (2025)

Enhanced Reverberation as Supervision for Unsupervised Speech Separation
by: Saijo, Kohei, et al.
Published: (2024)

NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization
by: Masuyama, Yoshiki, et al.
Published: (2024)

Speech Enhancement and Dereverberation with Diffusion-based Generative Models
by: Richter, Julius, et al.
Published: (2022)

Task-Aware Unified Source Separation
by: Saijo, Kohei, et al.
Published: (2024)

The PESQetarian: On the Relevance of Goodhart's Law for Speech Enhancement
by: de Oliveira, Danilo, et al.
Published: (2024)

Single and Few-step Diffusion for Generative Speech Enhancement
by: Lay, Bunlong, et al.
Published: (2023)

Combining TF-GridNet and Mixture Encoder for Continuous Speech Separation for Meeting Transcription
by: Vieting, Peter, et al.
Published: (2023)

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
by: Lemercier, Jean-Marie, et al.
Published: (2022)

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation
by: Richter, Julius, et al.
Published: (2024)

Why does music source separation benefit from cacophony?
by: Jeon, Chang-Bin, et al.
Published: (2024)

Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
by: Masuyama, Yoshiki, et al.
Published: (2024)

Mind the Gap: Detecting Cluster Exits for Robust Local Density-Based Score Normalization in Anomalous Sound Detection
by: Wilkinghoff, Kevin, et al.
Published: (2026)

Sound Event Bounding Boxes
by: Ebbers, Janek, et al.
Published: (2024)

SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers
by: Koo, Junghyun, et al.
Published: (2024)

Exploring the Capability of Mamba in Speech Applications
by: Miyazaki, Koichi, et al.
Published: (2024)

Local Density-Based Anomaly Score Normalization for Domain Generalization
by: Wilkinghoff, Kevin, et al.
Published: (2025)

HASRD: Hierarchical Acoustic and Semantic Representation Disentanglement
by: Hussein, Amir, et al.
Published: (2025)

SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis
by: Baoueb, Teysir, et al.
Published: (2024)

Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech
by: de Oliveira, Danilo, et al.
Published: (2024)

Microphone Array Signal Processing and Deep Learning for Speech Enhancement
by: Haeb-Umbach, Reinhold, et al.
Published: (2025)

Investigating Training Objectives for Generative Speech Enhancement
by: Richter, Julius, et al.
Published: (2024)

Factorized RVQ-GAN For Disentangled Speech Tokenization
by: Khurana, Sameer, et al.
Published: (2025)

Simultaneous Diarization and Separation of Meetings through the Integration of Statistical Mixture Models
by: Cord-Landwehr, Tobias, et al.
Published: (2024)

Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization
by: von Neumann, Thilo, et al.
Published: (2023)

Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
by: Saijo, Kohei, et al.
Published: (2024)

Diffusion Buffer for Online Generative Speech Enhancement
by: Lay, Bunlong, et al.
Published: (2025)

GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model
by: Liu, Haocheng, et al.
Published: (2024)

Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation
by: Wang, Wupeng, et al.
Published: (2025)

Investigating the Effects of Diffusion-based Conditional Generative Speech Models Used for Speech Enhancement on Dysarthric Speech
by: Reszka, Joanna, et al.
Published: (2024)