:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tuncay, Ludovic, Labbé, Etienne, Pellegrini, Thomas
Format:	Preprint
Published:	2025
Subjects:	Sound Machine Learning Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2503.21826
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Audio-JEPA: Joint-Embedding Predictive Architecture for Audio Representation Learning
by: Tuncay, Ludovic, et al.
Published: (2025)

BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
by: Xu, Xuenan, et al.
Published: (2023)

LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
by: Papaioannou, Charilaos, et al.
Published: (2024)

Geo-ATBench: A Benchmark for Geospatial Audio Tagging with Geospatial Semantic Context
by: Hou, Yuanbo, et al.
Published: (2026)

Music Foundation Model as Generic Booster for Music Downstream Tasks
by: Liao, WeiHsiang, et al.
Published: (2024)

Just Label the Repeats for In-The-Wild Audio-to-Score Alignment
by: Bukey, Irmak, et al.
Published: (2024)

Sound Tagging in Infant-centric Home Soundscapes
by: Khan, Mohammad Nur Hossain, et al.
Published: (2024)

Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data
by: Mahdi, Hamza, et al.
Published: (2024)

Semantic-Aware Interpretable Multimodal Music Auto-Tagging
by: Patakis, Andreas, et al.
Published: (2025)

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
by: Kong, Zhifeng, et al.
Published: (2024)

Diffusion Models for Audio Restoration
by: Lemercier, Jean-Marie, et al.
Published: (2024)

Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?
by: Feng, Tiantian, et al.
Published: (2024)

A2SB: Audio-to-Audio Schrodinger Bridges
by: Kong, Zhifeng, et al.
Published: (2025)

An Experimental Comparison Of Multi-view Self-supervised Methods For Music Tagging
by: Meseguer-Brocal, Gabriel, et al.
Published: (2024)

Enhancing Audio-Language Models through Self-Supervised Post-Training with Text-Audio Pairs
by: Sinha, Anshuman, et al.
Published: (2024)

"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
by: Gupta, Isha, et al.
Published: (2025)

A Language Model With Million Context Length For Raw Audio
by: Verma, Prateek
Published: (2022)

Text-Queried Audio Source Separation via Hierarchical Modeling
by: Yin, Xinlei, et al.
Published: (2025)

Streaming Audio Transformers for Online Audio Tagging
by: Dinkel, Heinrich, et al.
Published: (2023)

Enhancing Synthetic Training Data for Speech Commands: From ASR-Based Filtering to Domain Adaptation in SSL Latent Space
by: Quintas, Sebastião, et al.
Published: (2024)

Cross-utterance ASR Rescoring with Graph-based Label Propagation
by: Tankasala, Srinath, et al.
Published: (2023)

Do Audio-Language Models Understand Linguistic Variations?
by: Selvakumar, Ramaneswaran, et al.
Published: (2024)

Label-Context-Dependent Internal Language Model Estimation for CTC
by: Yang, Zijian, et al.
Published: (2025)

COVID-19 Detection System: A Comparative Analysis of System Performance Based on Acoustic Features of Cough Audio Signals
by: Shati, Asmaa, et al.
Published: (2023)

TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
by: Primus, Paul, et al.
Published: (2025)

Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
by: Dutta, Soumya, et al.
Published: (2024)

The Rarity of Musical Audio Signals Within the Space of Possible Audio Generation
by: Collins, Nick
Published: (2024)

Estimated Audio-Caption Correspondences Improve Language-Based Audio Retrieval
by: Primus, Paul, et al.
Published: (2024)

Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval
by: Primus, Paul, et al.
Published: (2024)

RiTTA: Modeling Event Relations in Text-to-Audio Generation
by: He, Yuhang, et al.
Published: (2024)

On the Utility of Speech and Audio Foundation Models for Marmoset Call Analysis
by: Sarkar, Eklavya, et al.
Published: (2024)

Continued Pretraining for Low-Resource Swahili ASR: Achieving State-of-the-Art Performance with Minimal Labeled Data
by: Mutisya, Hillary, et al.
Published: (2026)

Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos
by: Fedorishin, Dennis, et al.
Published: (2024)

CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
by: Takeuchi, Daiki, et al.
Published: (2025)

Audio Geolocation: A Natural Sounds Benchmark
by: Chasmai, Mustafa, et al.
Published: (2025)

Music Boomerang: Reusing Diffusion Models for Data Augmentation and Audio Manipulation
by: Fichtinger, Alexander, et al.
Published: (2025)

uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures
by: Tabassum, Afrina, et al.
Published: (2024)

Unsupervised Composable Representations for Audio
by: Bindi, Giovanni, et al.
Published: (2024)

Multi-bit Audio Watermarking
by: Lanzendörfer, Luca A., et al.
Published: (2025)

Instabilities in Convnets for Raw Audio
by: Haider, Daniel, et al.
Published: (2023)