:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Singh, Shubhr, Benetos, Emmanouil, Phan, Huy, Stowell, Dan
Format:	Preprint
Published:	2025
Subjects:	Sound Artificial Intelligence Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2501.03464
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GraFPrint: A GNN-Based Approach for Audio Identification
by: Bhattacharjee, Aditya, et al.
Published: (2024)

ST-ITO: Controlling Audio Effects for Style Transfer with Inference-Time Optimization
by: Steinmetz, Christian J., et al.
Published: (2024)

Audio-JEPA: Joint-Embedding Predictive Architecture for Audio Representation Learning
by: Tuncay, Ludovic, et al.
Published: (2025)

Acoustic identification of individual animals with hierarchical contrastive learning
by: Nolasco, Ines, et al.
Published: (2024)

LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
by: Papaioannou, Charilaos, et al.
Published: (2024)

Compressing Quaternion Convolutional Neural Networks for Audio Classification
by: Singh, Arshdeep, et al.
Published: (2025)

Audio Mamba: Pretrained Audio State Space Model For Audio Tagging
by: Lin, Jiaju, et al.
Published: (2024)

Integrating IP Broadcasting with Audio Tags: Workflow and Challenges
by: Burchett-Vass, Rhys, et al.
Published: (2024)

Perceptual Musical Features for Interpretable Audio Tagging
by: Lyberatos, Vassilis, et al.
Published: (2023)

Classification of Spontaneous and Scripted Speech for Multilingual Audio
by: Elisha, Shahar, et al.
Published: (2024)

Raw Audio Classification with Cosine Convolutional Neural Network (CosCovNN)
by: Haque, Kazi Nazmul, et al.
Published: (2024)

Learning Music Audio Representations With Limited Data
by: Plachouras, Christos, et al.
Published: (2025)

CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following
by: Ma, Yinghao, et al.
Published: (2025)

Heterogeneous bimodal attention fusion for speech emotion recognition
by: Luo, Jiachen, et al.
Published: (2025)

Comprehensive Evaluation of CNN-Based Audio Tagging Models on Resource-Constrained Devices
by: Grau-Haro, Jordi, et al.
Published: (2025)

Mind the Domain Gap: a Systematic Analysis on Bioacoustic Sound Event Detection
by: Liang, Jinhua, et al.
Published: (2024)

Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model
by: Huang, Jiawen, et al.
Published: (2024)

Fundamental Survey on Neuromorphic Based Audio Classification
by: Basu, Amlan, et al.
Published: (2025)

EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
by: Kim, Jaeyeon, et al.
Published: (2024)

GraphMuse: A Library for Symbolic Music Graph Processing
by: Karystinaios, Emmanouil, et al.
Published: (2024)

Domain-Invariant Representation Learning of Bird Sounds
by: Moummad, Ilyass, et al.
Published: (2024)

Audio-to-Image Encoding for Improved Voice Characteristic Detection Using Deep Convolutional Neural Networks
by: Atif, Youness
Published: (2025)

In-the-wild Audio Spatialization with Flexible Text-guided Localization
by: Pan, Tianrui, et al.
Published: (2025)

HyperPotter: Spell the Charm of High-Order Interactions in Audio Deepfake Detection
by: Wen, Qing, et al.
Published: (2026)

RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection
by: Chang, Sungkyun, et al.
Published: (2025)

BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics
by: Rauch, Lukas, et al.
Published: (2024)

Studying the Effect of Audio Filters in Pre-Trained Models for Environmental Sound Classification
by: Dawn, Aditya, et al.
Published: (2024)

Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism
by: Zhong, Jiafeng, et al.
Published: (2024)

AFEN: Respiratory Disease Classification using Ensemble Learning
by: Nadkarni, Rahul, et al.
Published: (2024)

SpectroStream: A Versatile Neural Codec for General Audio
by: Li, Yunpeng, et al.
Published: (2025)

Temporal Information Reconstruction and Non-Aligned Residual in Spiking Neural Networks for Speech Classification
by: Zhang, Qi, et al.
Published: (2024)

Quantum-Inspired Audio Unlearning: Towards Privacy-Preserving Voice Biometrics
by: Pathak, Shreyansh, et al.
Published: (2025)

Automatic acoustic detection of birds through deep learning: the first Bird Audio Detection challenge
by: Stowell, Dan, et al.
Published: (2018)

4,500 Seconds: Small Data Training Approaches for Deep UAV Audio Classification
by: Berg, Andrew P., et al.
Published: (2025)

ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation
by: Feng, Tiantian, et al.
Published: (2024)

AND: Audio Network Dissection for Interpreting Deep Acoustic Models
by: Wu, Tung-Yu, et al.
Published: (2024)

How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?
by: Liu, Tianchi, et al.
Published: (2024)

Towards Leveraging Contrastively Pretrained Neural Audio Embeddings for Recommender Tasks
by: Grötschla, Florian, et al.
Published: (2024)

Domain Adaptation Method and Modality Gap Impact in Audio-Text Models for Prototypical Sound Classification
by: Acevedo, Emiliano, et al.
Published: (2025)

Audio Deepfake Detection in the Age of Advanced Text-to-Speech models
by: Singh, Robin, et al.
Published: (2026)