Saved in:
| Main Authors: | Yuksel, Goksenin, Guetschel, Pierre, Tangermann, Michael, van Gerven, Marcel, van der Heijden, Kiki |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.23238 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GRAM: Spatial general-purpose audio representation models for real-world applications
by: Yuksel, Goksenin, et al.
Published: (2025)
by: Yuksel, Goksenin, et al.
Published: (2025)
GRAM: Spatial general-purpose audio representations for real-world environments
by: Yuksel, Goksenin, et al.
Published: (2026)
by: Yuksel, Goksenin, et al.
Published: (2026)
Leveraging Spatial Cues from Cochlear Implant Microphones to Efficiently Enhance Speech Separation in Real-World Listening Scenes
by: Olalere, Feyisayo, et al.
Published: (2025)
by: Olalere, Feyisayo, et al.
Published: (2025)
Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments
by: Ledder, Wessel, et al.
Published: (2024)
by: Ledder, Wessel, et al.
Published: (2024)
BAST: Binaural Audio Spectrogram Transformer for Binaural Sound Localization
by: Kuang, Sheng, et al.
Published: (2022)
by: Kuang, Sheng, et al.
Published: (2022)
Scaling up masked audio encoder learning for general audio classification
by: Dinkel, Heinrich, et al.
Published: (2024)
by: Dinkel, Heinrich, et al.
Published: (2024)
A robust audio deepfake detection system via multi-view feature
by: Yang, Yujie, et al.
Published: (2024)
by: Yang, Yujie, et al.
Published: (2024)
Wav2Small: Distilling Wav2Vec2 to 72K parameters for Low-Resource Speech emotion recognition
by: Kounadis-Bastian, Dionyssos, et al.
Published: (2024)
by: Kounadis-Bastian, Dionyssos, et al.
Published: (2024)
Decodable but not structured: linear probing enables Underwater Acoustic Target Recognition with pretrained audio embeddings
by: Hummel, Hilde I., et al.
Published: (2026)
by: Hummel, Hilde I., et al.
Published: (2026)
AxLSTMs: learning self-supervised audio representations with xLSTMs
by: Yadav, Sarthak, et al.
Published: (2024)
by: Yadav, Sarthak, et al.
Published: (2024)
Deep learning based spatial aliasing reduction in beamforming for audio capture
by: Guzik, Mateusz, et al.
Published: (2025)
by: Guzik, Mateusz, et al.
Published: (2025)
Efficient learning-based sound propagation for virtual and real-world audio processing applications
by: Ratnarajah, Anton Jeran
Published: (2024)
by: Ratnarajah, Anton Jeran
Published: (2024)
Self-supervised learning method using multiple sampling strategies for general-purpose audio representation
by: Kuroyanagi, Ibuki, et al.
Published: (2025)
by: Kuroyanagi, Ibuki, et al.
Published: (2025)
Towards audio language modeling -- an overview
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
WavMark: Watermarking for Audio Generation
by: Chen, Guangyu, et al.
Published: (2023)
by: Chen, Guangyu, et al.
Published: (2023)
Are audio DeepFake detection models polyglots?
by: Marek, Bartłomiej, et al.
Published: (2024)
by: Marek, Bartłomiej, et al.
Published: (2024)
Probing mental health information in speech foundation models
by: de Gennes, Marc, et al.
Published: (2024)
by: de Gennes, Marc, et al.
Published: (2024)
ManWav: The First Manchu ASR Model
by: Seo, Jean, et al.
Published: (2024)
by: Seo, Jean, et al.
Published: (2024)
Adapting WavLM for Speech Emotion Recognition
by: Diatlova, Daria, et al.
Published: (2024)
by: Diatlova, Daria, et al.
Published: (2024)
Tweaking autoregressive methods for inpainting of gaps in audio signals
by: Mokrý, Ondřej, et al.
Published: (2024)
by: Mokrý, Ondřej, et al.
Published: (2024)
MBCodec:Thorough disentangle for high-fidelity audio compression
by: Zhang, Ruonan, et al.
Published: (2025)
by: Zhang, Ruonan, et al.
Published: (2025)
Real-time implementation of vibrato transfer as an audio effect
by: Hyrkas, Jeremy
Published: (2025)
by: Hyrkas, Jeremy
Published: (2025)
Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning
by: Deng, Keqi, et al.
Published: (2024)
by: Deng, Keqi, et al.
Published: (2024)
Frame-Aligned Fusion of Canary and WavLM for Non-Intrusive Intelligibility Prediction of Hearing-Aid-Processed Speech
by: Nakazawa, Kazushi
Published: (2026)
by: Nakazawa, Kazushi
Published: (2026)
Versatile audio-visual learning for emotion recognition
by: Goncalves, Lucas, et al.
Published: (2023)
by: Goncalves, Lucas, et al.
Published: (2023)
Speaker anonymization using neural audio codec language models
by: Panariello, Michele, et al.
Published: (2023)
by: Panariello, Michele, et al.
Published: (2023)
FxSearcher: gradient-free text-driven audio transformation
by: Ki, Hojoon, et al.
Published: (2025)
by: Ki, Hojoon, et al.
Published: (2025)
Regularized autoregressive modeling and its application to audio signal reconstruction
by: Mokrý, Ondřej, et al.
Published: (2024)
by: Mokrý, Ondřej, et al.
Published: (2024)
EDTC: enhance depth of text comprehension in automated audio captioning
by: Tan, Liwen, et al.
Published: (2024)
by: Tan, Liwen, et al.
Published: (2024)
Quality of Automatic Speech Recognition -- Polish Language case study -- from Wav2Vec to Scribe ElevenLabs
by: Pietroń, Marcin, et al.
Published: (2026)
by: Pietroń, Marcin, et al.
Published: (2026)
XWSB: A Blend System Utilizing XLS-R and WavLM with SLS Classifier detection system for SVDD 2024 Challenge
by: Zhang, Qishan, et al.
Published: (2024)
by: Zhang, Qishan, et al.
Published: (2024)
STASE: A spatialized text-to-audio synthesis engine for music generation
by: Chi, Tutti, et al.
Published: (2025)
by: Chi, Tutti, et al.
Published: (2025)
Enhancement by postfiltering for speech and audio coding in ad-hoc sensor networks
by: Das, Sneha, et al.
Published: (2020)
by: Das, Sneha, et al.
Published: (2020)
Human-CLAP: Human-perception-based contrastive language-audio pretraining
by: Takano, Taisei, et al.
Published: (2025)
by: Takano, Taisei, et al.
Published: (2025)
DashengTokenizer: One layer is enough for unified audio understanding and generation
by: Dinkel, Heinrich, et al.
Published: (2026)
by: Dinkel, Heinrich, et al.
Published: (2026)
EnCodecMAE: Leveraging neural codecs for universal audio representation learning
by: Pepino, Leonardo, et al.
Published: (2023)
by: Pepino, Leonardo, et al.
Published: (2023)
AudioMAE++: learning better masked audio representations with SwiGLU FFNs
by: Yadav, Sarthak, et al.
Published: (2025)
by: Yadav, Sarthak, et al.
Published: (2025)
RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio
by: Kanamori, Yusuke, et al.
Published: (2025)
by: Kanamori, Yusuke, et al.
Published: (2025)
ICGAN: An implicit conditioning method for interpretable feature control of neural audio synthesis
by: Liu, Yunyi, et al.
Published: (2024)
by: Liu, Yunyi, et al.
Published: (2024)
Reconstructing the Charlie Parker Omnibook using an audio-to-score automatic transcription pipeline
by: Riley, Xavier, et al.
Published: (2024)
by: Riley, Xavier, et al.
Published: (2024)
Similar Items
-
GRAM: Spatial general-purpose audio representation models for real-world applications
by: Yuksel, Goksenin, et al.
Published: (2025) -
GRAM: Spatial general-purpose audio representations for real-world environments
by: Yuksel, Goksenin, et al.
Published: (2026) -
Leveraging Spatial Cues from Cochlear Implant Microphones to Efficiently Enhance Speech Separation in Real-World Listening Scenes
by: Olalere, Feyisayo, et al.
Published: (2025) -
Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments
by: Ledder, Wessel, et al.
Published: (2024) -
BAST: Binaural Audio Spectrogram Transformer for Binaural Sound Localization
by: Kuang, Sheng, et al.
Published: (2022)