:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yuksel, Goksenin, van Gerven, Marcel, van der Heijden, Kiki
Format:	Preprint
Published:	2026
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2602.03307
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GRAM: Spatial general-purpose audio representation models for real-world applications
by: Yuksel, Goksenin, et al.
Published: (2025)

WavJEPA: Semantic learning unlocks robust audio foundation models for raw waveforms
by: Yuksel, Goksenin, et al.
Published: (2025)

Leveraging Spatial Cues from Cochlear Implant Microphones to Efficiently Enhance Speech Separation in Real-World Listening Scenes
by: Olalere, Feyisayo, et al.
Published: (2025)

Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments
by: Ledder, Wessel, et al.
Published: (2024)

BAST: Binaural Audio Spectrogram Transformer for Binaural Sound Localization
by: Kuang, Sheng, et al.
Published: (2022)

Self-supervised learning method using multiple sampling strategies for general-purpose audio representation
by: Kuroyanagi, Ibuki, et al.
Published: (2025)

Speech Separation for Hearing-Impaired Children in the Classroom
by: Olalere, Feyisayo, et al.
Published: (2025)

Enabling automatic transcription of child-centered audio recordings from real-world environments
by: Kocharov, Daniil, et al.
Published: (2025)

IsoNet: Spatially-aware audio-visual target speech extraction in complex acoustic environments
by: Padhya, Dinanath, et al.
Published: (2026)

Emoanti: audio anti-deepfake with refined emotion-guided representations
by: Li, Xiaokang, et al.
Published: (2025)

Efficient learning-based sound propagation for virtual and real-world audio processing applications
by: Ratnarajah, Anton Jeran
Published: (2024)

Scaling up masked audio encoder learning for general audio classification
by: Dinkel, Heinrich, et al.
Published: (2024)

Spatial-CLAP: Learning Spatially-Aware audio--text Embeddings for Multi-Source Conditions
by: Seki, Kentaro, et al.
Published: (2025)

FoleyGRAM: Video-to-Audio Generation with GRAM-Aligned Multimodal Encoders
by: Gramaccioni, Riccardo Fosco, et al.
Published: (2025)

Decodable but not structured: linear probing enables Underwater Acoustic Target Recognition with pretrained audio embeddings
by: Hummel, Hilde I., et al.
Published: (2026)

Multi-layer attentive probing improves transfer of audio representations for bioacoustics
by: Miron, Marius, et al.
Published: (2026)

Transformation of audio embeddings into interpretable, concept-based representations
by: Zhang, Alice, et al.
Published: (2025)

AxLSTMs: learning self-supervised audio representations with xLSTMs
by: Yadav, Sarthak, et al.
Published: (2024)

Visual-based spatial audio generation system for multi-speaker environments
by: Liu, Xiaojing, et al.
Published: (2025)

Investigating self-supervised representations for audio-visual deepfake detection
by: Boldisor, Dragos-Alexandru, et al.
Published: (2025)

Keep what you need : extracting efficient subnetworks from large audio representation models
by: Genova, David, et al.
Published: (2025)

Late fusion ensembles for speech recognition on diverse input audio representations
by: Jezidžić, Marin, et al.
Published: (2024)

Towards generalizing deep-audio fake detection networks
by: Gasenzer, Konstantin, et al.
Published: (2023)

SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation
by: Muna, Ummy Maria, et al.
Published: (2025)

EnCodecMAE: Leveraging neural codecs for universal audio representation learning
by: Pepino, Leonardo, et al.
Published: (2023)

AudioMAE++: learning better masked audio representations with SwiGLU FFNs
by: Yadav, Sarthak, et al.
Published: (2025)

Better audio representations are more brain-like: linking model-brain alignment with performance in downstream auditory tasks
by: Pepino, Leonardo, et al.
Published: (2025)

Exploring bat song syllable representations in self-supervised audio encoders
by: Kloots, Marianne de Heer, et al.
Published: (2024)

Making deep neural networks work for medical audio: representation, compression and domain adaptation
by: Onu, Charles C
Published: (2025)

An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
by: Yadav, Sarthak, et al.
Published: (2025)

ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
by: Jing, Xin, et al.
Published: (2024)

Sonalyzer-Moz: A Framework for Analyzing the Structure of Mozart's Sonata Form
by: Zhao, Jing, et al.
Published: (2026)

Bird detection in audio: a survey and a challenge
by: Stowell, Dan, et al.
Published: (2016)

Stage-adaptive audio diffusion modeling
by: Zhang, Xuanhao, et al.
Published: (2026)

TQCodec: Towards neural audio codec for high-fidelity music streaming
by: He, Lixing, et al.
Published: (2026)

Towards audio language modeling -- an overview
by: Wu, Haibin, et al.
Published: (2024)

On Correlating Factors for Domain Adaptation Performance
by: Yuksel, Goksenin, et al.
Published: (2025)

Interpretability Analysis of Domain Adapted Dense Retrievers
by: Yuksel, Goksenin, et al.
Published: (2025)

Training chord recognition models on artificially generated audio
by: Majchrzak, Martyna, et al.
Published: (2025)

Are audio DeepFake detection models polyglots?
by: Marek, Bartłomiej, et al.
Published: (2024)