Saved in:
| Main Authors: | Ledder, Wessel, Qin, Yuzhen, van der Heijden, Kiki |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.10048 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GRAM: Spatial general-purpose audio representation models for real-world applications
by: Yuksel, Goksenin, et al.
Published: (2025)
by: Yuksel, Goksenin, et al.
Published: (2025)
Leveraging Spatial Cues from Cochlear Implant Microphones to Efficiently Enhance Speech Separation in Real-World Listening Scenes
by: Olalere, Feyisayo, et al.
Published: (2025)
by: Olalere, Feyisayo, et al.
Published: (2025)
Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation
by: Xu, Jingyi, et al.
Published: (2024)
by: Xu, Jingyi, et al.
Published: (2024)
WavJEPA: Semantic learning unlocks robust audio foundation models for raw waveforms
by: Yuksel, Goksenin, et al.
Published: (2025)
by: Yuksel, Goksenin, et al.
Published: (2025)
Do Models Hear Like Us? Probing the Representational Alignment of Audio LLMs and Naturalistic EEG
by: Yang, Haoyun, et al.
Published: (2026)
by: Yang, Haoyun, et al.
Published: (2026)
EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation
by: Zhu, Tianheng, et al.
Published: (2025)
by: Zhu, Tianheng, et al.
Published: (2025)
BAST: Binaural Audio Spectrogram Transformer for Binaural Sound Localization
by: Kuang, Sheng, et al.
Published: (2022)
by: Kuang, Sheng, et al.
Published: (2022)
LLM-Guided Reinforcement Learning for Audio-Visual Speech Enhancement
by: Chen, Chih-Ning, et al.
Published: (2026)
by: Chen, Chih-Ning, et al.
Published: (2026)
DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis
by: Hong, Fa-Ting, et al.
Published: (2024)
by: Hong, Fa-Ting, et al.
Published: (2024)
ABHINAYA -- A System for Speech Emotion Recognition In Naturalistic Conditions Challenge
by: Dutta, Soumya, et al.
Published: (2025)
by: Dutta, Soumya, et al.
Published: (2025)
LPIPS-AttnWav2Lip: Generic Audio-Driven lip synchronization for Talking Head Generation in the Wild
by: Chen, Zhipeng, et al.
Published: (2026)
by: Chen, Zhipeng, et al.
Published: (2026)
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
by: Erol, Mehmet Hamza, et al.
Published: (2024)
by: Erol, Mehmet Hamza, et al.
Published: (2024)
Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation
by: Zhou, Xukun, et al.
Published: (2024)
by: Zhou, Xukun, et al.
Published: (2024)
Go witheFlow: Real-time Emotion Driven Audio Effects Modulation
by: Dervakos, Edmund, et al.
Published: (2025)
by: Dervakos, Edmund, et al.
Published: (2025)
Audio Mamba: Pretrained Audio State Space Model For Audio Tagging
by: Lin, Jiaju, et al.
Published: (2024)
by: Lin, Jiaju, et al.
Published: (2024)
Audio Atlas: Visualizing and Exploring Audio Datasets
by: Lanzendörfer, Luca A., et al.
Published: (2024)
by: Lanzendörfer, Luca A., et al.
Published: (2024)
The Computation of Generalized Embeddings for Underwater Acoustic Target Recognition using Contrastive Learning
by: Hummel, Hilde I., et al.
Published: (2025)
by: Hummel, Hilde I., et al.
Published: (2025)
Representation-Regularized Convolutional Audio Transformer for Audio Understanding
by: Han, Bing, et al.
Published: (2026)
by: Han, Bing, et al.
Published: (2026)
Audio Spatially-Guided Fusion for Audio-Visual Navigation
by: Zhou, Xinyu, et al.
Published: (2026)
by: Zhou, Xinyu, et al.
Published: (2026)
PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis
by: Xie, Yifan, et al.
Published: (2024)
by: Xie, Yifan, et al.
Published: (2024)
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
by: Kim, Jaeyeon, et al.
Published: (2024)
by: Kim, Jaeyeon, et al.
Published: (2024)
Towards Attention-based Contrastive Learning for Audio Spoof Detection
by: Goel, Chirag, et al.
Published: (2024)
by: Goel, Chirag, et al.
Published: (2024)
Region-Based Optimization in Continual Learning for Audio Deepfake Detection
by: Chen, Yujie, et al.
Published: (2024)
by: Chen, Yujie, et al.
Published: (2024)
PALM: Few-Shot Prompt Learning for Audio Language Models
by: Hanif, Asif, et al.
Published: (2024)
by: Hanif, Asif, et al.
Published: (2024)
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
by: Zhao, Junqi, et al.
Published: (2025)
by: Zhao, Junqi, et al.
Published: (2025)
DreamAudio: Customized Text-to-Audio Generation with Diffusion Models
by: Yuan, Yi, et al.
Published: (2025)
by: Yuan, Yi, et al.
Published: (2025)
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
by: Yadav, Sarthak, et al.
Published: (2024)
by: Yadav, Sarthak, et al.
Published: (2024)
AudioScene: Integrating Object-Event Audio into 3D Scenes
by: Yuan, Shuaihang, et al.
Published: (2025)
by: Yuan, Shuaihang, et al.
Published: (2025)
Stable Audio Open
by: Evans, Zach, et al.
Published: (2024)
by: Evans, Zach, et al.
Published: (2024)
Unveiling Visual Biases in Audio-Visual Localization Benchmarks
by: Chen, Liangyu, et al.
Published: (2024)
by: Chen, Liangyu, et al.
Published: (2024)
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
by: Song, Zirui, et al.
Published: (2025)
by: Song, Zirui, et al.
Published: (2025)
BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval
by: Lu, Zhenyu, et al.
Published: (2024)
by: Lu, Zhenyu, et al.
Published: (2024)
SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training
by: Mei, Xinhao, et al.
Published: (2026)
by: Mei, Xinhao, et al.
Published: (2026)
AudioRouter: Data Efficient Audio Understanding via RL based Dual Reasoning
by: Chen, Liyang, et al.
Published: (2026)
by: Chen, Liyang, et al.
Published: (2026)
UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models
by: Shi, Qundong, et al.
Published: (2026)
by: Shi, Qundong, et al.
Published: (2026)
Estimating Musical Surprisal in Audio
by: Bjare, Mathias Rose, et al.
Published: (2025)
by: Bjare, Mathias Rose, et al.
Published: (2025)
Enhancing Retrieval-Augmented Audio Captioning with Generation-Assisted Multimodal Querying and Progressive Learning
by: Changin, Choi, et al.
Published: (2024)
by: Changin, Choi, et al.
Published: (2024)
ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation
by: Feng, Tiantian, et al.
Published: (2024)
by: Feng, Tiantian, et al.
Published: (2024)
Deepfake Audio Detection Using Spectrogram-based Feature and Ensemble of Deep Learning Models
by: Pham, Lam, et al.
Published: (2024)
by: Pham, Lam, et al.
Published: (2024)
Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection
by: Cai, Pengfei, et al.
Published: (2024)
by: Cai, Pengfei, et al.
Published: (2024)
Similar Items
-
GRAM: Spatial general-purpose audio representation models for real-world applications
by: Yuksel, Goksenin, et al.
Published: (2025) -
Leveraging Spatial Cues from Cochlear Implant Microphones to Efficiently Enhance Speech Separation in Real-World Listening Scenes
by: Olalere, Feyisayo, et al.
Published: (2025) -
Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation
by: Xu, Jingyi, et al.
Published: (2024) -
WavJEPA: Semantic learning unlocks robust audio foundation models for raw waveforms
by: Yuksel, Goksenin, et al.
Published: (2025) -
Do Models Hear Like Us? Probing the Representational Alignment of Audio LLMs and Naturalistic EEG
by: Yang, Haoyun, et al.
Published: (2026)