Saved in:
| Main Authors: | Pepino, Leonardo, Riera, Pablo, Kamienkowski, Juan, Ferrer, Luciana |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.16849 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EnCodecMAE: Leveraging neural codecs for universal audio representation learning
by: Pepino, Leonardo, et al.
Published: (2023)
by: Pepino, Leonardo, et al.
Published: (2023)
Fusion approaches for emotion recognition from speech using acoustic and text-based features
by: Pepino, Leonardo, et al.
Published: (2024)
by: Pepino, Leonardo, et al.
Published: (2024)
Benchmarking Time-localized Explanations for Audio Classification Models
by: Bolaños, Cecilia, et al.
Published: (2025)
by: Bolaños, Cecilia, et al.
Published: (2025)
The Unreliability of Acoustic Systems in Alzheimer's Speech Datasets with Heterogeneous Recording Conditions
by: Gauder, Lara, et al.
Published: (2024)
by: Gauder, Lara, et al.
Published: (2024)
Transformation of audio embeddings into interpretable, concept-based representations
by: Zhang, Alice, et al.
Published: (2025)
by: Zhang, Alice, et al.
Published: (2025)
Training chord recognition models on artificially generated audio
by: Majchrzak, Martyna, et al.
Published: (2025)
by: Majchrzak, Martyna, et al.
Published: (2025)
Late fusion ensembles for speech recognition on diverse input audio representations
by: Jezidžić, Marin, et al.
Published: (2024)
by: Jezidžić, Marin, et al.
Published: (2024)
Investigating self-supervised representations for audio-visual deepfake detection
by: Boldisor, Dragos-Alexandru, et al.
Published: (2025)
by: Boldisor, Dragos-Alexandru, et al.
Published: (2025)
Exploring bat song syllable representations in self-supervised audio encoders
by: Kloots, Marianne de Heer, et al.
Published: (2024)
by: Kloots, Marianne de Heer, et al.
Published: (2024)
A contrastive-learning approach for auditory attention detection
by: Bajestan, Seyed Ali Alavi, et al.
Published: (2024)
by: Bajestan, Seyed Ali Alavi, et al.
Published: (2024)
Do we need more complex representations for structure? A comparison of note duration representation for Music Transformers
by: Souza, Gabriel, et al.
Published: (2024)
by: Souza, Gabriel, et al.
Published: (2024)
Enabling automatic transcription of child-centered audio recordings from real-world environments
by: Kocharov, Daniil, et al.
Published: (2025)
by: Kocharov, Daniil, et al.
Published: (2025)
IsoNet: Spatially-aware audio-visual target speech extraction in complex acoustic environments
by: Padhya, Dinanath, et al.
Published: (2026)
by: Padhya, Dinanath, et al.
Published: (2026)
The silence of the weights: a structural pruning strategy for attention-based audio signal architectures with second order metrics
by: Diecidue, Andrea, et al.
Published: (2025)
by: Diecidue, Andrea, et al.
Published: (2025)
Towards generalizing deep-audio fake detection networks
by: Gasenzer, Konstantin, et al.
Published: (2023)
by: Gasenzer, Konstantin, et al.
Published: (2023)
A Dataset for Automatic Assessment of TTS Quality in Spanish
by: Welford, Alejandro Sosa, et al.
Published: (2025)
by: Welford, Alejandro Sosa, et al.
Published: (2025)
Versatile audio-visual learning for emotion recognition
by: Goncalves, Lucas, et al.
Published: (2023)
by: Goncalves, Lucas, et al.
Published: (2023)
Testing chatbots on the creation of encoders for audio conditioned image generation
by: León, Jorge E., et al.
Published: (2025)
by: León, Jorge E., et al.
Published: (2025)
Unsupervised outlier detection to improve bird audio dataset labels
by: Collins, Bruce
Published: (2025)
by: Collins, Bruce
Published: (2025)
Combining audio control and style transfer using latent diffusion
by: Demerlé, Nils, et al.
Published: (2024)
by: Demerlé, Nils, et al.
Published: (2024)
Adaptive vector steering: A training-free, layer-wise intervention for hallucination mitigation in large audio and multimodal models
by: Lin, Tsung-En, et al.
Published: (2025)
by: Lin, Tsung-En, et al.
Published: (2025)
Mitigating data replication in text-to-audio generative diffusion models through anti-memorization guidance
by: Messina, Francisco, et al.
Published: (2025)
by: Messina, Francisco, et al.
Published: (2025)
Recomposer: Event-roll-guided generative audio editing
by: Ellis, Daniel P. W., et al.
Published: (2025)
by: Ellis, Daniel P. W., et al.
Published: (2025)
Adversarial multi-task underwater acoustic target recognition: towards robustness against various influential factors
by: Xie, Yuan, et al.
Published: (2024)
by: Xie, Yuan, et al.
Published: (2024)
Sentiment analysis in non-fixed length audios using a Fully Convolutional Neural Network
by: García-Ordás, María Teresa, et al.
Published: (2024)
by: García-Ordás, María Teresa, et al.
Published: (2024)
Decodable but not structured: linear probing enables Underwater Acoustic Target Recognition with pretrained audio embeddings
by: Hummel, Hilde I., et al.
Published: (2026)
by: Hummel, Hilde I., et al.
Published: (2026)
Mixer is more than just a model
by: Ji, Qingfeng, et al.
Published: (2024)
by: Ji, Qingfeng, et al.
Published: (2024)
NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
by: Robinson, David, et al.
Published: (2024)
by: Robinson, David, et al.
Published: (2024)
BELT-2: Bootstrapping EEG-to-Language representation alignment for multi-task brain decoding
by: Zhou, Jinzhao, et al.
Published: (2024)
by: Zhou, Jinzhao, et al.
Published: (2024)
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
by: Siuzdak, Hubert
Published: (2023)
by: Siuzdak, Hubert
Published: (2023)
Towards auditory attention decoding with noise-tagging: A pilot study
by: Scheppink, H. A., et al.
Published: (2024)
by: Scheppink, H. A., et al.
Published: (2024)
Benchmarks and leaderboards for sound demixing tasks
by: Solovyev, Roman, et al.
Published: (2023)
by: Solovyev, Roman, et al.
Published: (2023)
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
by: Maiti, Soumi, et al.
Published: (2023)
by: Maiti, Soumi, et al.
Published: (2023)
Joint sentiment analysis of lyrics and audio in music
by: Schaab, Lea, et al.
Published: (2024)
by: Schaab, Lea, et al.
Published: (2024)
Character-aware audio-visual subtitling in context
by: Huh, Jaesung, et al.
Published: (2024)
by: Huh, Jaesung, et al.
Published: (2024)
Visual representations in the human brain are aligned with large language models
by: Doerig, Adrien, et al.
Published: (2022)
by: Doerig, Adrien, et al.
Published: (2022)
DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distribution
by: Pizarro, Matías, et al.
Published: (2023)
by: Pizarro, Matías, et al.
Published: (2023)
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
by: Primus, Paul, et al.
Published: (2025)
by: Primus, Paul, et al.
Published: (2025)
A Toolkit for Detecting Spurious Correlations in Speech Datasets
by: Gauder, Lara, et al.
Published: (2026)
by: Gauder, Lara, et al.
Published: (2026)
Supervised contrastive learning from weakly-labeled audio segments for musical version matching
by: Serrà, Joan, et al.
Published: (2025)
by: Serrà, Joan, et al.
Published: (2025)
Similar Items
-
EnCodecMAE: Leveraging neural codecs for universal audio representation learning
by: Pepino, Leonardo, et al.
Published: (2023) -
Fusion approaches for emotion recognition from speech using acoustic and text-based features
by: Pepino, Leonardo, et al.
Published: (2024) -
Benchmarking Time-localized Explanations for Audio Classification Models
by: Bolaños, Cecilia, et al.
Published: (2025) -
The Unreliability of Acoustic Systems in Alzheimer's Speech Datasets with Heterogeneous Recording Conditions
by: Gauder, Lara, et al.
Published: (2024) -
Transformation of audio embeddings into interpretable, concept-based representations
by: Zhang, Alice, et al.
Published: (2025)