Saved in:
| Main Authors: | Tuncay, Ludovic, Labbé, Etienne, Pellegrini, Thomas |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.21826 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Audio-JEPA: Joint-Embedding Predictive Architecture for Audio Representation Learning
by: Tuncay, Ludovic, et al.
Published: (2025)
by: Tuncay, Ludovic, et al.
Published: (2025)
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
by: Xu, Xuenan, et al.
Published: (2023)
by: Xu, Xuenan, et al.
Published: (2023)
LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
by: Papaioannou, Charilaos, et al.
Published: (2024)
by: Papaioannou, Charilaos, et al.
Published: (2024)
Geo-ATBench: A Benchmark for Geospatial Audio Tagging with Geospatial Semantic Context
by: Hou, Yuanbo, et al.
Published: (2026)
by: Hou, Yuanbo, et al.
Published: (2026)
Music Foundation Model as Generic Booster for Music Downstream Tasks
by: Liao, WeiHsiang, et al.
Published: (2024)
by: Liao, WeiHsiang, et al.
Published: (2024)
Just Label the Repeats for In-The-Wild Audio-to-Score Alignment
by: Bukey, Irmak, et al.
Published: (2024)
by: Bukey, Irmak, et al.
Published: (2024)
Sound Tagging in Infant-centric Home Soundscapes
by: Khan, Mohammad Nur Hossain, et al.
Published: (2024)
by: Khan, Mohammad Nur Hossain, et al.
Published: (2024)
Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data
by: Mahdi, Hamza, et al.
Published: (2024)
by: Mahdi, Hamza, et al.
Published: (2024)
Semantic-Aware Interpretable Multimodal Music Auto-Tagging
by: Patakis, Andreas, et al.
Published: (2025)
by: Patakis, Andreas, et al.
Published: (2025)
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
by: Kong, Zhifeng, et al.
Published: (2024)
by: Kong, Zhifeng, et al.
Published: (2024)
Diffusion Models for Audio Restoration
by: Lemercier, Jean-Marie, et al.
Published: (2024)
by: Lemercier, Jean-Marie, et al.
Published: (2024)
Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?
by: Feng, Tiantian, et al.
Published: (2024)
by: Feng, Tiantian, et al.
Published: (2024)
A2SB: Audio-to-Audio Schrodinger Bridges
by: Kong, Zhifeng, et al.
Published: (2025)
by: Kong, Zhifeng, et al.
Published: (2025)
An Experimental Comparison Of Multi-view Self-supervised Methods For Music Tagging
by: Meseguer-Brocal, Gabriel, et al.
Published: (2024)
by: Meseguer-Brocal, Gabriel, et al.
Published: (2024)
Enhancing Audio-Language Models through Self-Supervised Post-Training with Text-Audio Pairs
by: Sinha, Anshuman, et al.
Published: (2024)
by: Sinha, Anshuman, et al.
Published: (2024)
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
by: Gupta, Isha, et al.
Published: (2025)
by: Gupta, Isha, et al.
Published: (2025)
A Language Model With Million Context Length For Raw Audio
by: Verma, Prateek
Published: (2022)
by: Verma, Prateek
Published: (2022)
Text-Queried Audio Source Separation via Hierarchical Modeling
by: Yin, Xinlei, et al.
Published: (2025)
by: Yin, Xinlei, et al.
Published: (2025)
Streaming Audio Transformers for Online Audio Tagging
by: Dinkel, Heinrich, et al.
Published: (2023)
by: Dinkel, Heinrich, et al.
Published: (2023)
Enhancing Synthetic Training Data for Speech Commands: From ASR-Based Filtering to Domain Adaptation in SSL Latent Space
by: Quintas, Sebastião, et al.
Published: (2024)
by: Quintas, Sebastião, et al.
Published: (2024)
Cross-utterance ASR Rescoring with Graph-based Label Propagation
by: Tankasala, Srinath, et al.
Published: (2023)
by: Tankasala, Srinath, et al.
Published: (2023)
Do Audio-Language Models Understand Linguistic Variations?
by: Selvakumar, Ramaneswaran, et al.
Published: (2024)
by: Selvakumar, Ramaneswaran, et al.
Published: (2024)
Label-Context-Dependent Internal Language Model Estimation for CTC
by: Yang, Zijian, et al.
Published: (2025)
by: Yang, Zijian, et al.
Published: (2025)
COVID-19 Detection System: A Comparative Analysis of System Performance Based on Acoustic Features of Cough Audio Signals
by: Shati, Asmaa, et al.
Published: (2023)
by: Shati, Asmaa, et al.
Published: (2023)
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
by: Primus, Paul, et al.
Published: (2025)
by: Primus, Paul, et al.
Published: (2025)
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
by: Dutta, Soumya, et al.
Published: (2024)
by: Dutta, Soumya, et al.
Published: (2024)
The Rarity of Musical Audio Signals Within the Space of Possible Audio Generation
by: Collins, Nick
Published: (2024)
by: Collins, Nick
Published: (2024)
Estimated Audio-Caption Correspondences Improve Language-Based Audio Retrieval
by: Primus, Paul, et al.
Published: (2024)
by: Primus, Paul, et al.
Published: (2024)
Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval
by: Primus, Paul, et al.
Published: (2024)
by: Primus, Paul, et al.
Published: (2024)
RiTTA: Modeling Event Relations in Text-to-Audio Generation
by: He, Yuhang, et al.
Published: (2024)
by: He, Yuhang, et al.
Published: (2024)
On the Utility of Speech and Audio Foundation Models for Marmoset Call Analysis
by: Sarkar, Eklavya, et al.
Published: (2024)
by: Sarkar, Eklavya, et al.
Published: (2024)
Continued Pretraining for Low-Resource Swahili ASR: Achieving State-of-the-Art Performance with Minimal Labeled Data
by: Mutisya, Hillary, et al.
Published: (2026)
by: Mutisya, Hillary, et al.
Published: (2026)
Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos
by: Fedorishin, Dennis, et al.
Published: (2024)
by: Fedorishin, Dennis, et al.
Published: (2024)
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
by: Takeuchi, Daiki, et al.
Published: (2025)
by: Takeuchi, Daiki, et al.
Published: (2025)
Audio Geolocation: A Natural Sounds Benchmark
by: Chasmai, Mustafa, et al.
Published: (2025)
by: Chasmai, Mustafa, et al.
Published: (2025)
Music Boomerang: Reusing Diffusion Models for Data Augmentation and Audio Manipulation
by: Fichtinger, Alexander, et al.
Published: (2025)
by: Fichtinger, Alexander, et al.
Published: (2025)
uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures
by: Tabassum, Afrina, et al.
Published: (2024)
by: Tabassum, Afrina, et al.
Published: (2024)
Unsupervised Composable Representations for Audio
by: Bindi, Giovanni, et al.
Published: (2024)
by: Bindi, Giovanni, et al.
Published: (2024)
Multi-bit Audio Watermarking
by: Lanzendörfer, Luca A., et al.
Published: (2025)
by: Lanzendörfer, Luca A., et al.
Published: (2025)
Instabilities in Convnets for Raw Audio
by: Haider, Daniel, et al.
Published: (2023)
by: Haider, Daniel, et al.
Published: (2023)
Similar Items
-
Audio-JEPA: Joint-Embedding Predictive Architecture for Audio Representation Learning
by: Tuncay, Ludovic, et al.
Published: (2025) -
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
by: Xu, Xuenan, et al.
Published: (2023) -
LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
by: Papaioannou, Charilaos, et al.
Published: (2024) -
Geo-ATBench: A Benchmark for Geospatial Audio Tagging with Geospatial Semantic Context
by: Hou, Yuanbo, et al.
Published: (2026) -
Music Foundation Model as Generic Booster for Music Downstream Tasks
by: Liao, WeiHsiang, et al.
Published: (2024)