:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Pepino, Leonardo, Riera, Pablo, Kamienkowski, Juan, Ferrer, Luciana
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Sound
Online Access:	https://arxiv.org/abs/2511.16849
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

EnCodecMAE: Leveraging neural codecs for universal audio representation learning
by: Pepino, Leonardo, et al.
Published: (2023)

Fusion approaches for emotion recognition from speech using acoustic and text-based features
by: Pepino, Leonardo, et al.
Published: (2024)

Benchmarking Time-localized Explanations for Audio Classification Models
by: Bolaños, Cecilia, et al.
Published: (2025)

The Unreliability of Acoustic Systems in Alzheimer's Speech Datasets with Heterogeneous Recording Conditions
by: Gauder, Lara, et al.
Published: (2024)

Transformation of audio embeddings into interpretable, concept-based representations
by: Zhang, Alice, et al.
Published: (2025)

Training chord recognition models on artificially generated audio
by: Majchrzak, Martyna, et al.
Published: (2025)

Late fusion ensembles for speech recognition on diverse input audio representations
by: Jezidžić, Marin, et al.
Published: (2024)

Investigating self-supervised representations for audio-visual deepfake detection
by: Boldisor, Dragos-Alexandru, et al.
Published: (2025)

Exploring bat song syllable representations in self-supervised audio encoders
by: Kloots, Marianne de Heer, et al.
Published: (2024)

A contrastive-learning approach for auditory attention detection
by: Bajestan, Seyed Ali Alavi, et al.
Published: (2024)

Do we need more complex representations for structure? A comparison of note duration representation for Music Transformers
by: Souza, Gabriel, et al.
Published: (2024)

Enabling automatic transcription of child-centered audio recordings from real-world environments
by: Kocharov, Daniil, et al.
Published: (2025)

IsoNet: Spatially-aware audio-visual target speech extraction in complex acoustic environments
by: Padhya, Dinanath, et al.
Published: (2026)

The silence of the weights: a structural pruning strategy for attention-based audio signal architectures with second order metrics
by: Diecidue, Andrea, et al.
Published: (2025)

Towards generalizing deep-audio fake detection networks
by: Gasenzer, Konstantin, et al.
Published: (2023)

A Dataset for Automatic Assessment of TTS Quality in Spanish
by: Welford, Alejandro Sosa, et al.
Published: (2025)

Versatile audio-visual learning for emotion recognition
by: Goncalves, Lucas, et al.
Published: (2023)

Testing chatbots on the creation of encoders for audio conditioned image generation
by: León, Jorge E., et al.
Published: (2025)

Unsupervised outlier detection to improve bird audio dataset labels
by: Collins, Bruce
Published: (2025)

Combining audio control and style transfer using latent diffusion
by: Demerlé, Nils, et al.
Published: (2024)

Adaptive vector steering: A training-free, layer-wise intervention for hallucination mitigation in large audio and multimodal models
by: Lin, Tsung-En, et al.
Published: (2025)

Mitigating data replication in text-to-audio generative diffusion models through anti-memorization guidance
by: Messina, Francisco, et al.
Published: (2025)

Recomposer: Event-roll-guided generative audio editing
by: Ellis, Daniel P. W., et al.
Published: (2025)

Adversarial multi-task underwater acoustic target recognition: towards robustness against various influential factors
by: Xie, Yuan, et al.
Published: (2024)

Sentiment analysis in non-fixed length audios using a Fully Convolutional Neural Network
by: García-Ordás, María Teresa, et al.
Published: (2024)

Decodable but not structured: linear probing enables Underwater Acoustic Target Recognition with pretrained audio embeddings
by: Hummel, Hilde I., et al.
Published: (2026)

Mixer is more than just a model
by: Ji, Qingfeng, et al.
Published: (2024)

NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
by: Robinson, David, et al.
Published: (2024)

BELT-2: Bootstrapping EEG-to-Language representation alignment for multi-task brain decoding
by: Zhou, Jinzhao, et al.
Published: (2024)

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
by: Siuzdak, Hubert
Published: (2023)

Towards auditory attention decoding with noise-tagging: A pilot study
by: Scheppink, H. A., et al.
Published: (2024)

Benchmarks and leaderboards for sound demixing tasks
by: Solovyev, Roman, et al.
Published: (2023)

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
by: Maiti, Soumi, et al.
Published: (2023)

Joint sentiment analysis of lyrics and audio in music
by: Schaab, Lea, et al.
Published: (2024)

Character-aware audio-visual subtitling in context
by: Huh, Jaesung, et al.
Published: (2024)

Visual representations in the human brain are aligned with large language models
by: Doerig, Adrien, et al.
Published: (2022)

DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distribution
by: Pizarro, Matías, et al.
Published: (2023)

TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
by: Primus, Paul, et al.
Published: (2025)

A Toolkit for Detecting Spurious Correlations in Speech Datasets
by: Gauder, Lara, et al.
Published: (2026)

Supervised contrastive learning from weakly-labeled audio segments for musical version matching
by: Serrà, Joan, et al.
Published: (2025)