Saved in:
| Main Authors: | Ayilo, Jean-Eudes, Sadeghi, Mostafa, Serizel, Romain, Alameda-Pineda, Xavier |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.09931 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement
by: Sadeghi, Mostafa, et al.
Published: (2025)
by: Sadeghi, Mostafa, et al.
Published: (2025)
Diffusion-based Unsupervised Audio-visual Speech Enhancement
by: Ayilo, Jean-Eudes, et al.
Published: (2024)
by: Ayilo, Jean-Eudes, et al.
Published: (2024)
Frequency-Weighted Training Losses for Phoneme-Level DNN-based Speech Enhancement
by: Monir, Nasser-Eddine, et al.
Published: (2025)
by: Monir, Nasser-Eddine, et al.
Published: (2025)
A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms
by: Monir, Nasser-Eddine, et al.
Published: (2024)
by: Monir, Nasser-Eddine, et al.
Published: (2024)
Residual Tokens Enhance Masked Autoencoders for Speech Modeling
by: Sadok, Samir, et al.
Published: (2026)
by: Sadok, Samir, et al.
Published: (2026)
Evaluating Multichannel Speech Enhancement Algorithms at the Phoneme Scale Across Genders
by: Monir, Nasser-Eddine, et al.
Published: (2025)
by: Monir, Nasser-Eddine, et al.
Published: (2025)
A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation
by: Airale, Louis, et al.
Published: (2023)
by: Airale, Louis, et al.
Published: (2023)
The Equalizer: Introducing Shape-Gain Decomposition in Neural Audio Codecs
by: Sadok, Samir, et al.
Published: (2026)
by: Sadok, Samir, et al.
Published: (2026)
Modeling strategies for speech enhancement in the latent space of a neural audio codec
by: Kammoun, Sofiene, et al.
Published: (2025)
by: Kammoun, Sofiene, et al.
Published: (2025)
From Computation to Consumption: Exploring the Compute-Energy Link for Training and Testing Neural Networks for SED Systems
by: Douwes, Constance, et al.
Published: (2024)
by: Douwes, Constance, et al.
Published: (2024)
Performance and energy balance: a comprehensive study of state-of-the-art sound event detection systems
by: Ronchini, Francesca, et al.
Published: (2023)
by: Ronchini, Francesca, et al.
Published: (2023)
AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
by: Sadok, Samir, et al.
Published: (2025)
by: Sadok, Samir, et al.
Published: (2025)
Metric Analysis for Spatial Semantic Segmentation of Sound Scenes
by: Mishra, Mayank, et al.
Published: (2025)
by: Mishra, Mayank, et al.
Published: (2025)
A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes
by: Ronchini, Francesca, et al.
Published: (2022)
by: Ronchini, Francesca, et al.
Published: (2022)
Energy Consumption Trends in Sound Event Detection Systems
by: Douwes, Constance, et al.
Published: (2024)
by: Douwes, Constance, et al.
Published: (2024)
The Costs of Reproducibility in Music Separation Research: a Replication of Band-Split RNN
by: Magron, Paul, et al.
Published: (2026)
by: Magron, Paul, et al.
Published: (2026)
Angular Distance Distribution Loss for Audio Classification
by: Almudévar, Antonio, et al.
Published: (2024)
by: Almudévar, Antonio, et al.
Published: (2024)
Tracking of Intermittent and Moving Speakers : Dataset and Metrics
by: Iatariene, Taous, et al.
Published: (2025)
by: Iatariene, Taous, et al.
Published: (2025)
Self-Supervised Learning for Few-Shot Bird Sound Classification
by: Moummad, Ilyass, et al.
Published: (2023)
by: Moummad, Ilyass, et al.
Published: (2023)
Regularized Contrastive Pre-training for Few-shot Bioacoustic Sound Detection
by: Moummad, Ilyass, et al.
Published: (2023)
by: Moummad, Ilyass, et al.
Published: (2023)
Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models
by: Passoni, Riccardo, et al.
Published: (2025)
by: Passoni, Riccardo, et al.
Published: (2025)
Towards Low-Latency Tracking of Multiple Speakers With Short-Context Speaker Embeddings
by: Iatariene, Taous, et al.
Published: (2025)
by: Iatariene, Taous, et al.
Published: (2025)
Domain-Invariant Representation Learning of Bird Sounds
by: Moummad, Ilyass, et al.
Published: (2024)
by: Moummad, Ilyass, et al.
Published: (2024)
A multimodal dynamical variational autoencoder for audiovisual speech representation learning
by: Sadok, Samir, et al.
Published: (2023)
by: Sadok, Samir, et al.
Published: (2023)
The impact of non-target events in synthetic soundscapes for sound event detection
by: Ronchini, Francesca, et al.
Published: (2021)
by: Ronchini, Francesca, et al.
Published: (2021)
Speaker Embeddings to Improve Tracking of Intermittent and Moving Speakers
by: Iatariene, Taous, et al.
Published: (2025)
by: Iatariene, Taous, et al.
Published: (2025)
Speech Enhancement and Dereverberation with Diffusion-based Generative Models
by: Richter, Julius, et al.
Published: (2022)
by: Richter, Julius, et al.
Published: (2022)
Latent Watermarking of Audio Generative Models
by: Roman, Robin San, et al.
Published: (2024)
by: Roman, Robin San, et al.
Published: (2024)
A decade of DCASE: Achievements, practices, evaluations and future challenges
by: Mesaros, Annamaria, et al.
Published: (2024)
by: Mesaros, Annamaria, et al.
Published: (2024)
Diffusion-based Signal Refiner for Speech Enhancement and Separation
by: Hirano, Masato, et al.
Published: (2023)
by: Hirano, Masato, et al.
Published: (2023)
Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds
by: Moummad, Ilyass, et al.
Published: (2024)
by: Moummad, Ilyass, et al.
Published: (2024)
An Analysis of the Variance of Diffusion-based Speech Enhancement
by: Lay, Bunlong, et al.
Published: (2024)
by: Lay, Bunlong, et al.
Published: (2024)
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
by: Lemercier, Jean-Marie, et al.
Published: (2022)
by: Lemercier, Jean-Marie, et al.
Published: (2022)
Unsupervised Speech Enhancement using Data-defined Priors
by: Klement, Dominik, et al.
Published: (2025)
by: Klement, Dominik, et al.
Published: (2025)
Absorbing Discrete Diffusion for Speech Enhancement
by: Gonzalez, Philippe
Published: (2026)
by: Gonzalez, Philippe
Published: (2026)
Unified Architecture and Unsupervised Speech Disentanglement for Speaker Embedding-Free Enrollment in Personalized Speech Enhancement
by: Huang, Ziling, et al.
Published: (2025)
by: Huang, Ziling, et al.
Published: (2025)
Data-independent Beamforming for End-to-end Multichannel Multi-speaker ASR
by: Cui, Can, et al.
Published: (2025)
by: Cui, Can, et al.
Published: (2025)
Diffusion-based Speech Enhancement with Schrödinger Bridge and Symmetric Noise Schedule
by: Wang, Siyi, et al.
Published: (2024)
by: Wang, Siyi, et al.
Published: (2024)
Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
by: Li, Chenda, et al.
Published: (2024)
by: Li, Chenda, et al.
Published: (2024)
ArtiFree: Detecting and Reducing Generative Artifacts in Diffusion-based Speech Enhancement
by: Chhaglani, Bhawana, et al.
Published: (2025)
by: Chhaglani, Bhawana, et al.
Published: (2025)
Similar Items
-
Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement
by: Sadeghi, Mostafa, et al.
Published: (2025) -
Diffusion-based Unsupervised Audio-visual Speech Enhancement
by: Ayilo, Jean-Eudes, et al.
Published: (2024) -
Frequency-Weighted Training Losses for Phoneme-Level DNN-based Speech Enhancement
by: Monir, Nasser-Eddine, et al.
Published: (2025) -
A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms
by: Monir, Nasser-Eddine, et al.
Published: (2024) -
Residual Tokens Enhance Masked Autoencoders for Speech Modeling
by: Sadok, Samir, et al.
Published: (2026)