Saved in:
| Main Authors: | Yuksel, Goksenin, van Gerven, Marcel, van der Heijden, Kiki |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.03307 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GRAM: Spatial general-purpose audio representation models for real-world applications
by: Yuksel, Goksenin, et al.
Published: (2025)
by: Yuksel, Goksenin, et al.
Published: (2025)
WavJEPA: Semantic learning unlocks robust audio foundation models for raw waveforms
by: Yuksel, Goksenin, et al.
Published: (2025)
by: Yuksel, Goksenin, et al.
Published: (2025)
Leveraging Spatial Cues from Cochlear Implant Microphones to Efficiently Enhance Speech Separation in Real-World Listening Scenes
by: Olalere, Feyisayo, et al.
Published: (2025)
by: Olalere, Feyisayo, et al.
Published: (2025)
Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments
by: Ledder, Wessel, et al.
Published: (2024)
by: Ledder, Wessel, et al.
Published: (2024)
BAST: Binaural Audio Spectrogram Transformer for Binaural Sound Localization
by: Kuang, Sheng, et al.
Published: (2022)
by: Kuang, Sheng, et al.
Published: (2022)
Self-supervised learning method using multiple sampling strategies for general-purpose audio representation
by: Kuroyanagi, Ibuki, et al.
Published: (2025)
by: Kuroyanagi, Ibuki, et al.
Published: (2025)
Speech Separation for Hearing-Impaired Children in the Classroom
by: Olalere, Feyisayo, et al.
Published: (2025)
by: Olalere, Feyisayo, et al.
Published: (2025)
Enabling automatic transcription of child-centered audio recordings from real-world environments
by: Kocharov, Daniil, et al.
Published: (2025)
by: Kocharov, Daniil, et al.
Published: (2025)
IsoNet: Spatially-aware audio-visual target speech extraction in complex acoustic environments
by: Padhya, Dinanath, et al.
Published: (2026)
by: Padhya, Dinanath, et al.
Published: (2026)
Emoanti: audio anti-deepfake with refined emotion-guided representations
by: Li, Xiaokang, et al.
Published: (2025)
by: Li, Xiaokang, et al.
Published: (2025)
Efficient learning-based sound propagation for virtual and real-world audio processing applications
by: Ratnarajah, Anton Jeran
Published: (2024)
by: Ratnarajah, Anton Jeran
Published: (2024)
Scaling up masked audio encoder learning for general audio classification
by: Dinkel, Heinrich, et al.
Published: (2024)
by: Dinkel, Heinrich, et al.
Published: (2024)
Spatial-CLAP: Learning Spatially-Aware audio--text Embeddings for Multi-Source Conditions
by: Seki, Kentaro, et al.
Published: (2025)
by: Seki, Kentaro, et al.
Published: (2025)
FoleyGRAM: Video-to-Audio Generation with GRAM-Aligned Multimodal Encoders
by: Gramaccioni, Riccardo Fosco, et al.
Published: (2025)
by: Gramaccioni, Riccardo Fosco, et al.
Published: (2025)
Decodable but not structured: linear probing enables Underwater Acoustic Target Recognition with pretrained audio embeddings
by: Hummel, Hilde I., et al.
Published: (2026)
by: Hummel, Hilde I., et al.
Published: (2026)
Multi-layer attentive probing improves transfer of audio representations for bioacoustics
by: Miron, Marius, et al.
Published: (2026)
by: Miron, Marius, et al.
Published: (2026)
Transformation of audio embeddings into interpretable, concept-based representations
by: Zhang, Alice, et al.
Published: (2025)
by: Zhang, Alice, et al.
Published: (2025)
AxLSTMs: learning self-supervised audio representations with xLSTMs
by: Yadav, Sarthak, et al.
Published: (2024)
by: Yadav, Sarthak, et al.
Published: (2024)
Visual-based spatial audio generation system for multi-speaker environments
by: Liu, Xiaojing, et al.
Published: (2025)
by: Liu, Xiaojing, et al.
Published: (2025)
Investigating self-supervised representations for audio-visual deepfake detection
by: Boldisor, Dragos-Alexandru, et al.
Published: (2025)
by: Boldisor, Dragos-Alexandru, et al.
Published: (2025)
Keep what you need : extracting efficient subnetworks from large audio representation models
by: Genova, David, et al.
Published: (2025)
by: Genova, David, et al.
Published: (2025)
Late fusion ensembles for speech recognition on diverse input audio representations
by: Jezidžić, Marin, et al.
Published: (2024)
by: Jezidžić, Marin, et al.
Published: (2024)
Towards generalizing deep-audio fake detection networks
by: Gasenzer, Konstantin, et al.
Published: (2023)
by: Gasenzer, Konstantin, et al.
Published: (2023)
SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation
by: Muna, Ummy Maria, et al.
Published: (2025)
by: Muna, Ummy Maria, et al.
Published: (2025)
EnCodecMAE: Leveraging neural codecs for universal audio representation learning
by: Pepino, Leonardo, et al.
Published: (2023)
by: Pepino, Leonardo, et al.
Published: (2023)
AudioMAE++: learning better masked audio representations with SwiGLU FFNs
by: Yadav, Sarthak, et al.
Published: (2025)
by: Yadav, Sarthak, et al.
Published: (2025)
Better audio representations are more brain-like: linking model-brain alignment with performance in downstream auditory tasks
by: Pepino, Leonardo, et al.
Published: (2025)
by: Pepino, Leonardo, et al.
Published: (2025)
Exploring bat song syllable representations in self-supervised audio encoders
by: Kloots, Marianne de Heer, et al.
Published: (2024)
by: Kloots, Marianne de Heer, et al.
Published: (2024)
Making deep neural networks work for medical audio: representation, compression and domain adaptation
by: Onu, Charles C
Published: (2025)
by: Onu, Charles C
Published: (2025)
An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
by: Yadav, Sarthak, et al.
Published: (2025)
by: Yadav, Sarthak, et al.
Published: (2025)
ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
by: Jing, Xin, et al.
Published: (2024)
by: Jing, Xin, et al.
Published: (2024)
Sonalyzer-Moz: A Framework for Analyzing the Structure of Mozart's Sonata Form
by: Zhao, Jing, et al.
Published: (2026)
by: Zhao, Jing, et al.
Published: (2026)
Bird detection in audio: a survey and a challenge
by: Stowell, Dan, et al.
Published: (2016)
by: Stowell, Dan, et al.
Published: (2016)
Stage-adaptive audio diffusion modeling
by: Zhang, Xuanhao, et al.
Published: (2026)
by: Zhang, Xuanhao, et al.
Published: (2026)
TQCodec: Towards neural audio codec for high-fidelity music streaming
by: He, Lixing, et al.
Published: (2026)
by: He, Lixing, et al.
Published: (2026)
Towards audio language modeling -- an overview
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
On Correlating Factors for Domain Adaptation Performance
by: Yuksel, Goksenin, et al.
Published: (2025)
by: Yuksel, Goksenin, et al.
Published: (2025)
Interpretability Analysis of Domain Adapted Dense Retrievers
by: Yuksel, Goksenin, et al.
Published: (2025)
by: Yuksel, Goksenin, et al.
Published: (2025)
Training chord recognition models on artificially generated audio
by: Majchrzak, Martyna, et al.
Published: (2025)
by: Majchrzak, Martyna, et al.
Published: (2025)
Are audio DeepFake detection models polyglots?
by: Marek, Bartłomiej, et al.
Published: (2024)
by: Marek, Bartłomiej, et al.
Published: (2024)
Similar Items
-
GRAM: Spatial general-purpose audio representation models for real-world applications
by: Yuksel, Goksenin, et al.
Published: (2025) -
WavJEPA: Semantic learning unlocks robust audio foundation models for raw waveforms
by: Yuksel, Goksenin, et al.
Published: (2025) -
Leveraging Spatial Cues from Cochlear Implant Microphones to Efficiently Enhance Speech Separation in Real-World Listening Scenes
by: Olalere, Feyisayo, et al.
Published: (2025) -
Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments
by: Ledder, Wessel, et al.
Published: (2024) -
BAST: Binaural Audio Spectrogram Transformer for Binaural Sound Localization
by: Kuang, Sheng, et al.
Published: (2022)