Saved in:
| Main Authors: | Llave, Adrien, Granier, Emma, Pallone, Grégory |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.16715 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A DNN Based Post-Filter to Enhance the Quality of Coded Speech in MDCT Domain
by: Gupta, Kishan, et al.
Published: (2022)
by: Gupta, Kishan, et al.
Published: (2022)
Learning Spatially-Aware Language and Audio Embeddings
by: Devnani, Bhavika, et al.
Published: (2024)
by: Devnani, Bhavika, et al.
Published: (2024)
Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval
by: Primus, Paul, et al.
Published: (2024)
by: Primus, Paul, et al.
Published: (2024)
HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids
by: Wisnu, Dyah A. M. G., et al.
Published: (2024)
by: Wisnu, Dyah A. M. G., et al.
Published: (2024)
Towards Audio Codec-based Speech Separation
by: Yip, Jia Qi, et al.
Published: (2024)
by: Yip, Jia Qi, et al.
Published: (2024)
A2SB: Audio-to-Audio Schrodinger Bridges
by: Kong, Zhifeng, et al.
Published: (2025)
by: Kong, Zhifeng, et al.
Published: (2025)
Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning
by: Luong, Manh, et al.
Published: (2025)
by: Luong, Manh, et al.
Published: (2025)
Audio-based automatic mating success prediction of giant pandas
by: Yan, WeiRan, et al.
Published: (2019)
by: Yan, WeiRan, et al.
Published: (2019)
Investigating Faithfulness in Large Audio Language Models
by: Mousavi, Pooneh, et al.
Published: (2025)
by: Mousavi, Pooneh, et al.
Published: (2025)
Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors
by: Hauret, Julien, et al.
Published: (2024)
by: Hauret, Julien, et al.
Published: (2024)
Exploring Meta Information for Audio-based Zero-shot Bird Classification
by: Gebhard, Alexander, et al.
Published: (2023)
by: Gebhard, Alexander, et al.
Published: (2023)
KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation
by: Chung, Yoonjin, et al.
Published: (2025)
by: Chung, Yoonjin, et al.
Published: (2025)
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
by: Primus, Paul, et al.
Published: (2025)
by: Primus, Paul, et al.
Published: (2025)
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
by: Dutta, Soumya, et al.
Published: (2024)
by: Dutta, Soumya, et al.
Published: (2024)
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
by: Kong, Zhifeng, et al.
Published: (2024)
by: Kong, Zhifeng, et al.
Published: (2024)
The Rarity of Musical Audio Signals Within the Space of Possible Audio Generation
by: Collins, Nick
Published: (2024)
by: Collins, Nick
Published: (2024)
Estimated Audio-Caption Correspondences Improve Language-Based Audio Retrieval
by: Primus, Paul, et al.
Published: (2024)
by: Primus, Paul, et al.
Published: (2024)
Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System
by: Ali, Hashim, et al.
Published: (2025)
by: Ali, Hashim, et al.
Published: (2025)
ImmerseDiffusion: A Generative Spatial Audio Latent Diffusion Model
by: Heydari, Mojtaba, et al.
Published: (2024)
by: Heydari, Mojtaba, et al.
Published: (2024)
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
by: Takeuchi, Daiki, et al.
Published: (2025)
by: Takeuchi, Daiki, et al.
Published: (2025)
Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos
by: Fedorishin, Dennis, et al.
Published: (2024)
by: Fedorishin, Dennis, et al.
Published: (2024)
Audio Geolocation: A Natural Sounds Benchmark
by: Chasmai, Mustafa, et al.
Published: (2025)
by: Chasmai, Mustafa, et al.
Published: (2025)
SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
by: Shams, Siavash, et al.
Published: (2024)
by: Shams, Siavash, et al.
Published: (2024)
Similarity Choice and Negative Scaling in Supervised Contrastive Learning for Deepfake Audio Detection
by: Sudan, Jaskirat, et al.
Published: (2026)
by: Sudan, Jaskirat, et al.
Published: (2026)
uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures
by: Tabassum, Afrina, et al.
Published: (2024)
by: Tabassum, Afrina, et al.
Published: (2024)
JSQA: Speech Quality Assessment with Perceptually-Inspired Contrastive Pretraining Based on JND Audio Pairs
by: Fan, Junyi, et al.
Published: (2025)
by: Fan, Junyi, et al.
Published: (2025)
Multi-bit Audio Watermarking
by: Lanzendörfer, Luca A., et al.
Published: (2025)
by: Lanzendörfer, Luca A., et al.
Published: (2025)
Unsupervised Composable Representations for Audio
by: Bindi, Giovanni, et al.
Published: (2024)
by: Bindi, Giovanni, et al.
Published: (2024)
Diffusion Models for Audio Restoration
by: Lemercier, Jean-Marie, et al.
Published: (2024)
by: Lemercier, Jean-Marie, et al.
Published: (2024)
Instabilities in Convnets for Raw Audio
by: Haider, Daniel, et al.
Published: (2023)
by: Haider, Daniel, et al.
Published: (2023)
Automatic Contextual Audio Denoising
by: Luong, Diep, et al.
Published: (2026)
by: Luong, Diep, et al.
Published: (2026)
Diffusion-Based Unsupervised Audio-Visual Speech Separation in Noisy Environments with Noise Prior
by: Yemini, Yochai, et al.
Published: (2025)
by: Yemini, Yochai, et al.
Published: (2025)
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
by: Gupta, Isha, et al.
Published: (2025)
by: Gupta, Isha, et al.
Published: (2025)
Enhancing Audio-Language Models through Self-Supervised Post-Training with Text-Audio Pairs
by: Sinha, Anshuman, et al.
Published: (2024)
by: Sinha, Anshuman, et al.
Published: (2024)
Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?
by: Feng, Tiantian, et al.
Published: (2024)
by: Feng, Tiantian, et al.
Published: (2024)
A Data-Driven Diffusion-based Approach for Audio Deepfake Explanations
by: Grinberg, Petr, et al.
Published: (2025)
by: Grinberg, Petr, et al.
Published: (2025)
Is Audio Spoof Detection Robust to Laundering Attacks?
by: Ali, Hashim, et al.
Published: (2024)
by: Ali, Hashim, et al.
Published: (2024)
Prompt Amplification and Zero-Shot Late Fusion in Audio-Language Models for Speech Emotion Recognition
by: Kataria, Saurabh, et al.
Published: (2026)
by: Kataria, Saurabh, et al.
Published: (2026)
Lightweight DNN for Full-Band Speech Denoising on Mobile Devices: Exploiting Long and Short Temporal Patterns
by: Drossos, Konstantinos, et al.
Published: (2025)
by: Drossos, Konstantinos, et al.
Published: (2025)
A Novel Stochastic Transformer-based Approach for Post-Traumatic Stress Disorder Detection using Audio Recording of Clinical Interviews
by: Dia, Mamadou, et al.
Published: (2024)
by: Dia, Mamadou, et al.
Published: (2024)
Similar Items
-
A DNN Based Post-Filter to Enhance the Quality of Coded Speech in MDCT Domain
by: Gupta, Kishan, et al.
Published: (2022) -
Learning Spatially-Aware Language and Audio Embeddings
by: Devnani, Bhavika, et al.
Published: (2024) -
Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval
by: Primus, Paul, et al.
Published: (2024) -
HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids
by: Wisnu, Dyah A. M. G., et al.
Published: (2024) -
Towards Audio Codec-based Speech Separation
by: Yip, Jia Qi, et al.
Published: (2024)