:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yuksel, Goksenin, Guetschel, Pierre, Tangermann, Michael, van Gerven, Marcel, van der Heijden, Kiki
Format:	Preprint
Published:	2025
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2509.23238
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GRAM: Spatial general-purpose audio representation models for real-world applications
by: Yuksel, Goksenin, et al.
Published: (2025)

GRAM: Spatial general-purpose audio representations for real-world environments
by: Yuksel, Goksenin, et al.
Published: (2026)

Leveraging Spatial Cues from Cochlear Implant Microphones to Efficiently Enhance Speech Separation in Real-World Listening Scenes
by: Olalere, Feyisayo, et al.
Published: (2025)

Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments
by: Ledder, Wessel, et al.
Published: (2024)

BAST: Binaural Audio Spectrogram Transformer for Binaural Sound Localization
by: Kuang, Sheng, et al.
Published: (2022)

Scaling up masked audio encoder learning for general audio classification
by: Dinkel, Heinrich, et al.
Published: (2024)

A robust audio deepfake detection system via multi-view feature
by: Yang, Yujie, et al.
Published: (2024)

Wav2Small: Distilling Wav2Vec2 to 72K parameters for Low-Resource Speech emotion recognition
by: Kounadis-Bastian, Dionyssos, et al.
Published: (2024)

Decodable but not structured: linear probing enables Underwater Acoustic Target Recognition with pretrained audio embeddings
by: Hummel, Hilde I., et al.
Published: (2026)

AxLSTMs: learning self-supervised audio representations with xLSTMs
by: Yadav, Sarthak, et al.
Published: (2024)

Deep learning based spatial aliasing reduction in beamforming for audio capture
by: Guzik, Mateusz, et al.
Published: (2025)

Efficient learning-based sound propagation for virtual and real-world audio processing applications
by: Ratnarajah, Anton Jeran
Published: (2024)

Self-supervised learning method using multiple sampling strategies for general-purpose audio representation
by: Kuroyanagi, Ibuki, et al.
Published: (2025)

Towards audio language modeling -- an overview
by: Wu, Haibin, et al.
Published: (2024)

WavMark: Watermarking for Audio Generation
by: Chen, Guangyu, et al.
Published: (2023)

Are audio DeepFake detection models polyglots?
by: Marek, Bartłomiej, et al.
Published: (2024)

Probing mental health information in speech foundation models
by: de Gennes, Marc, et al.
Published: (2024)

ManWav: The First Manchu ASR Model
by: Seo, Jean, et al.
Published: (2024)

Adapting WavLM for Speech Emotion Recognition
by: Diatlova, Daria, et al.
Published: (2024)

Tweaking autoregressive methods for inpainting of gaps in audio signals
by: Mokrý, Ondřej, et al.
Published: (2024)

MBCodec:Thorough disentangle for high-fidelity audio compression
by: Zhang, Ruonan, et al.
Published: (2025)

Real-time implementation of vibrato transfer as an audio effect
by: Hyrkas, Jeremy
Published: (2025)

Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning
by: Deng, Keqi, et al.
Published: (2024)

Frame-Aligned Fusion of Canary and WavLM for Non-Intrusive Intelligibility Prediction of Hearing-Aid-Processed Speech
by: Nakazawa, Kazushi
Published: (2026)

Versatile audio-visual learning for emotion recognition
by: Goncalves, Lucas, et al.
Published: (2023)

Speaker anonymization using neural audio codec language models
by: Panariello, Michele, et al.
Published: (2023)

FxSearcher: gradient-free text-driven audio transformation
by: Ki, Hojoon, et al.
Published: (2025)

Regularized autoregressive modeling and its application to audio signal reconstruction
by: Mokrý, Ondřej, et al.
Published: (2024)

EDTC: enhance depth of text comprehension in automated audio captioning
by: Tan, Liwen, et al.
Published: (2024)

Quality of Automatic Speech Recognition -- Polish Language case study -- from Wav2Vec to Scribe ElevenLabs
by: Pietroń, Marcin, et al.
Published: (2026)

XWSB: A Blend System Utilizing XLS-R and WavLM with SLS Classifier detection system for SVDD 2024 Challenge
by: Zhang, Qishan, et al.
Published: (2024)

STASE: A spatialized text-to-audio synthesis engine for music generation
by: Chi, Tutti, et al.
Published: (2025)

Enhancement by postfiltering for speech and audio coding in ad-hoc sensor networks
by: Das, Sneha, et al.
Published: (2020)

Human-CLAP: Human-perception-based contrastive language-audio pretraining
by: Takano, Taisei, et al.
Published: (2025)

DashengTokenizer: One layer is enough for unified audio understanding and generation
by: Dinkel, Heinrich, et al.
Published: (2026)

EnCodecMAE: Leveraging neural codecs for universal audio representation learning
by: Pepino, Leonardo, et al.
Published: (2023)

AudioMAE++: learning better masked audio representations with SwiGLU FFNs
by: Yadav, Sarthak, et al.
Published: (2025)

RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio
by: Kanamori, Yusuke, et al.
Published: (2025)

ICGAN: An implicit conditioning method for interpretable feature control of neural audio synthesis
by: Liu, Yunyi, et al.
Published: (2024)

Reconstructing the Charlie Parker Omnibook using an audio-to-score automatic transcription pipeline
by: Riley, Xavier, et al.
Published: (2024)