:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Douwes, Constance, Serizel, Romain
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Sound
Online Access:	https://arxiv.org/abs/2409.05080
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Energy Consumption Trends in Sound Event Detection Systems
by: Douwes, Constance, et al.
Published: (2024)

The Costs of Reproducibility in Music Separation Research: a Replication of Band-Split RNN
by: Magron, Paul, et al.
Published: (2026)

Normalizing Energy Consumption for Hardware-Independent Evaluation
by: Douwes, Constance, et al.
Published: (2024)

Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models
by: Passoni, Riccardo, et al.
Published: (2025)

A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes
by: Ronchini, Francesca, et al.
Published: (2022)

Self-Supervised Learning for Few-Shot Bird Sound Classification
by: Moummad, Ilyass, et al.
Published: (2023)

Regularized Contrastive Pre-training for Few-shot Bioacoustic Sound Detection
by: Moummad, Ilyass, et al.
Published: (2023)

The impact of non-target events in synthetic soundscapes for sound event detection
by: Ronchini, Francesca, et al.
Published: (2021)

DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and Missing Labels
by: Cornell, Samuele, et al.
Published: (2024)

Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds
by: Moummad, Ilyass, et al.
Published: (2024)

Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement
by: Sadeghi, Mostafa, et al.
Published: (2025)

Dynamic Gated Recurrent Neural Network for Compute-efficient Speech Enhancement
by: Cheng, Longbiao, et al.
Published: (2024)

Performance and energy balance: a comprehensive study of state-of-the-art sound event detection systems
by: Ronchini, Francesca, et al.
Published: (2023)

Diffusion-based Unsupervised Audio-visual Speech Enhancement
by: Ayilo, Jean-Eudes, et al.
Published: (2024)

Metric Analysis for Spatial Semantic Segmentation of Sound Scenes
by: Mishra, Mayank, et al.
Published: (2025)

Frequency-Weighted Training Losses for Phoneme-Level DNN-based Speech Enhancement
by: Monir, Nasser-Eddine, et al.
Published: (2025)

Angular Distance Distribution Loss for Audio Classification
by: Almudévar, Antonio, et al.
Published: (2024)

Test-Time Training for Depression Detection
by: Dumpala, Sri Harsha, et al.
Published: (2024)

Test-Time Training for Speech Enhancement
by: Behera, Avishkar, et al.
Published: (2025)

Efficient Continual Learning in Keyword Spotting using Binary Neural Networks
by: Vu, Quynh Nguyen-Phuong, et al.
Published: (2025)

Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget
by: Liu, Andy T., et al.
Published: (2024)

Tracking of Intermittent and Moving Speakers : Dataset and Metrics
by: Iatariene, Taous, et al.
Published: (2025)

A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms
by: Monir, Nasser-Eddine, et al.
Published: (2024)

Evaluating Multichannel Speech Enhancement Algorithms at the Phoneme Scale Across Genders
by: Monir, Nasser-Eddine, et al.
Published: (2025)

EmoHRNet: High-Resolution Neural Network Based Speech Emotion Recognition
by: Muppidi, Akshay, et al.
Published: (2025)

Distributed Acoustic Sensing for Urban Traffic Monitoring: Spatio-Temporal Attention in Recurrent Neural Networks
by: Fakhruzi, Izhan, et al.
Published: (2026)

Computational music analysis from first principles
by: Tymoczko, Dmitri, et al.
Published: (2024)

Score-Based Training for Energy-Based TTS Models
by: Sun, Wanli, et al.
Published: (2025)

Diffusion-based Frameworks for Unsupervised Speech Enhancement
by: Ayilo, Jean-Eudes, et al.
Published: (2026)

From Real to Cloned Singer Identification
by: Desblancs, Dorian, et al.
Published: (2024)

Towards Low-Latency Tracking of Multiple Speakers With Short-Context Speaker Embeddings
by: Iatariene, Taous, et al.
Published: (2025)

Combolutional Neural Networks
by: Churchwell, Cameron, et al.
Published: (2025)

Domain-Invariant Representation Learning of Bird Sounds
by: Moummad, Ilyass, et al.
Published: (2024)

Speech Command Recognition Using LogNNet Reservoir Computing for Embedded Systems
by: Izotov, Yuriy, et al.
Published: (2025)

Causal Prosody Mediation for Text-to-Speech:Counterfactual Training of Duration, Pitch, and Energy in FastSpeech2
by: Mohanty, Suvendu Sekhar
Published: (2026)

Audio-Visual Continual Test-Time Adaptation without Forgetting
by: Maharana, Sarthak Kumar, et al.
Published: (2026)

Training-Free Multimodal Guidance for Video to Audio Generation
by: Grassucci, Eleonora, et al.
Published: (2025)

Training chord recognition models on artificially generated audio
by: Majchrzak, Martyna, et al.
Published: (2025)

SOI: Scaling Down Computational Complexity by Estimating Partial States of the Model
by: Stefański, Grzegorz, et al.
Published: (2024)

From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers
by: Feng, Jiu, et al.
Published: (2024)