:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ayilo, Jean-Eudes, Sadeghi, Mostafa, Serizel, Romain, Alameda-Pineda, Xavier
Format:	Preprint
Published:	2026
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2601.09931
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement
by: Sadeghi, Mostafa, et al.
Published: (2025)

Diffusion-based Unsupervised Audio-visual Speech Enhancement
by: Ayilo, Jean-Eudes, et al.
Published: (2024)

Frequency-Weighted Training Losses for Phoneme-Level DNN-based Speech Enhancement
by: Monir, Nasser-Eddine, et al.
Published: (2025)

A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms
by: Monir, Nasser-Eddine, et al.
Published: (2024)

Residual Tokens Enhance Masked Autoencoders for Speech Modeling
by: Sadok, Samir, et al.
Published: (2026)

Evaluating Multichannel Speech Enhancement Algorithms at the Phoneme Scale Across Genders
by: Monir, Nasser-Eddine, et al.
Published: (2025)

A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation
by: Airale, Louis, et al.
Published: (2023)

The Equalizer: Introducing Shape-Gain Decomposition in Neural Audio Codecs
by: Sadok, Samir, et al.
Published: (2026)

Modeling strategies for speech enhancement in the latent space of a neural audio codec
by: Kammoun, Sofiene, et al.
Published: (2025)

From Computation to Consumption: Exploring the Compute-Energy Link for Training and Testing Neural Networks for SED Systems
by: Douwes, Constance, et al.
Published: (2024)

Performance and energy balance: a comprehensive study of state-of-the-art sound event detection systems
by: Ronchini, Francesca, et al.
Published: (2023)

AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
by: Sadok, Samir, et al.
Published: (2025)

Metric Analysis for Spatial Semantic Segmentation of Sound Scenes
by: Mishra, Mayank, et al.
Published: (2025)

A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes
by: Ronchini, Francesca, et al.
Published: (2022)

Energy Consumption Trends in Sound Event Detection Systems
by: Douwes, Constance, et al.
Published: (2024)

The Costs of Reproducibility in Music Separation Research: a Replication of Band-Split RNN
by: Magron, Paul, et al.
Published: (2026)

Angular Distance Distribution Loss for Audio Classification
by: Almudévar, Antonio, et al.
Published: (2024)

Tracking of Intermittent and Moving Speakers : Dataset and Metrics
by: Iatariene, Taous, et al.
Published: (2025)

Self-Supervised Learning for Few-Shot Bird Sound Classification
by: Moummad, Ilyass, et al.
Published: (2023)

Regularized Contrastive Pre-training for Few-shot Bioacoustic Sound Detection
by: Moummad, Ilyass, et al.
Published: (2023)

Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models
by: Passoni, Riccardo, et al.
Published: (2025)

Towards Low-Latency Tracking of Multiple Speakers With Short-Context Speaker Embeddings
by: Iatariene, Taous, et al.
Published: (2025)

Domain-Invariant Representation Learning of Bird Sounds
by: Moummad, Ilyass, et al.
Published: (2024)

A multimodal dynamical variational autoencoder for audiovisual speech representation learning
by: Sadok, Samir, et al.
Published: (2023)

The impact of non-target events in synthetic soundscapes for sound event detection
by: Ronchini, Francesca, et al.
Published: (2021)

Speaker Embeddings to Improve Tracking of Intermittent and Moving Speakers
by: Iatariene, Taous, et al.
Published: (2025)

Speech Enhancement and Dereverberation with Diffusion-based Generative Models
by: Richter, Julius, et al.
Published: (2022)

Latent Watermarking of Audio Generative Models
by: Roman, Robin San, et al.
Published: (2024)

A decade of DCASE: Achievements, practices, evaluations and future challenges
by: Mesaros, Annamaria, et al.
Published: (2024)

Diffusion-based Signal Refiner for Speech Enhancement and Separation
by: Hirano, Masato, et al.
Published: (2023)

Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds
by: Moummad, Ilyass, et al.
Published: (2024)

An Analysis of the Variance of Diffusion-based Speech Enhancement
by: Lay, Bunlong, et al.
Published: (2024)

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
by: Lemercier, Jean-Marie, et al.
Published: (2022)

Unsupervised Speech Enhancement using Data-defined Priors
by: Klement, Dominik, et al.
Published: (2025)

Absorbing Discrete Diffusion for Speech Enhancement
by: Gonzalez, Philippe
Published: (2026)

Unified Architecture and Unsupervised Speech Disentanglement for Speaker Embedding-Free Enrollment in Personalized Speech Enhancement
by: Huang, Ziling, et al.
Published: (2025)

Data-independent Beamforming for End-to-end Multichannel Multi-speaker ASR
by: Cui, Can, et al.
Published: (2025)

Diffusion-based Speech Enhancement with Schrödinger Bridge and Symmetric Noise Schedule
by: Wang, Siyi, et al.
Published: (2024)

Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
by: Li, Chenda, et al.
Published: (2024)

ArtiFree: Detecting and Reducing Generative Artifacts in Diffusion-based Speech Enhancement
by: Chhaglani, Bhawana, et al.
Published: (2025)