:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Shanshan, Tripathy, Soumya, Heittola, Toni, Mesaros, Annamaria
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2402.02899
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Online incremental learning for audio classification using a pretrained audio model
by: Mulimani, Manjunath, et al.
Published: (2025)

A decade of DCASE: Achievements, practices, evaluations and future challenges
by: Mesaros, Annamaria, et al.
Published: (2024)

Sound event detection with audio-text models and heterogeneous temporal annotations
by: Harju, Manu, et al.
Published: (2025)

Low-Complexity Acoustic Scene Classification with Device Information in the DCASE 2025 Challenge
by: Schmid, Florian, et al.
Published: (2025)

Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE 2024 Challenge
by: Schmid, Florian, et al.
Published: (2024)

Incremental learning for audio classification with Hebbian Deep Neural Networks
by: Casciotti, Riccardo, et al.
Published: (2026)

Domain-Incremental Learning for Audio Classification
by: Mulimani, Manjunath, et al.
Published: (2024)

Online Domain-Incremental Learning Approach to Classify Acoustic Scenes in All Locations
by: Mulimani, Manjunath, et al.
Published: (2024)

Class-Incremental Learning for Multi-Label Audio Classification
by: Mulimani, Manjunath, et al.
Published: (2024)

Class-Incremental Learning for Sound Event Localization and Detection
by: Pandey, Ruchi, et al.
Published: (2024)

AxLSTMs: learning self-supervised audio representations with xLSTMs
by: Yadav, Sarthak, et al.
Published: (2024)

Self-supervised learning method using multiple sampling strategies for general-purpose audio representation
by: Kuroyanagi, Ibuki, et al.
Published: (2025)

Domain-Agnostic Incremental Learning for Sound Classification. A DCASE 2026 Challenge task
by: Casciotti, Riccardo, et al.
Published: (2026)

Multi-label Zero-Shot Audio Classification with Temporal Attention
by: Dogan, Duygu, et al.
Published: (2024)

Sound Event Detection and Localization with Distance Estimation
by: Krause, Daniel Aleksander, et al.
Published: (2024)

An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
by: Yadav, Sarthak, et al.
Published: (2025)

Computer Audition: From Task-Specific Machine Learning to Foundation Models
by: Triantafyllopoulos, Andreas, et al.
Published: (2024)

Curriculum learning for self-supervised speaker verification
by: Heo, Hee-Soo, et al.
Published: (2022)

Scaling up masked audio encoder learning for general audio classification
by: Dinkel, Heinrich, et al.
Published: (2024)

The role of audio-visual integration in the time course of phonetic encoding in self-supervised speech models
by: Wang, Yi, et al.
Published: (2025)

Equivariance-based self-supervised learning for audio signal recovery from clipped measurements
by: Sechaud, Victor, et al.
Published: (2024)

DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and Missing Labels
by: Cornell, Samuele, et al.
Published: (2024)

LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations
by: Dutta, Soumya, et al.
Published: (2025)

Exploring bat song syllable representations in self-supervised audio encoders
by: Kloots, Marianne de Heer, et al.
Published: (2024)

Automated data curation for self-supervised learning in underwater acoustic analysis
by: Hummel, Hilde I, et al.
Published: (2025)

Modeling strategies for speech enhancement in the latent space of a neural audio codec
by: Kammoun, Sofiene, et al.
Published: (2025)

An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge
by: Han, Runduo, et al.
Published: (2024)

From perception to production: how acoustic invariance facilitates articulatory learning in a self-supervised vocal imitation model
by: Lavechin, Marvin, et al.
Published: (2025)

Unmasking real-world audio deepfakes: A data-centric approach
by: Combei, David, et al.
Published: (2025)

Deep learning based spatial aliasing reduction in beamforming for audio capture
by: Guzik, Mateusz, et al.
Published: (2025)

Investigating self-supervised features for expressive, multilingual voice conversion
by: Martín-Cortinas, Álvaro, et al.
Published: (2025)

Meta-learning-based percussion transcription and $t\bar{a}la$ identification from low-resource audio
by: Kodag, Rahul Bapusaheb, et al.
Published: (2025)

AudioMorphix: Training-free audio editing with diffusion probabilistic models
by: Liang, Jinhua, et al.
Published: (2025)

Acoustic and Semantic Modeling of Emotion in Spoken Language
by: Dutta, Soumya
Published: (2026)

Efficient learning-based sound propagation for virtual and real-world audio processing applications
by: Ratnarajah, Anton Jeran
Published: (2024)

WavJEPA: Semantic learning unlocks robust audio foundation models for raw waveforms
by: Yuksel, Goksenin, et al.
Published: (2025)

WavLM model ensemble for audio deepfake detection
by: Combei, David, et al.
Published: (2024)

Cryfish: On deep audio analysis with Large Language Models
by: Mitrofanov, Anton, et al.
Published: (2025)

Multiple Hankel matrix rank minimization for audio inpainting
by: Záviška, Pavel, et al.
Published: (2023)

SCORE: Scaling audio generation using Standardized COmposite REwards
by: Jung, Jaemin, et al.
Published: (2025)