Saved in:
| Main Authors: | Rascon, Caleb, Gato-Diaz, Luis, García-Alarcón, Eduardo |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.02755 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Direction of Arrival Correction through Speech Quality Feedback
by: Rascon, Caleb
Published: (2024)
by: Rascon, Caleb
Published: (2024)
Scattering Transform for Auditory Attention Decoding
by: Pallenberg, René, et al.
Published: (2026)
by: Pallenberg, René, et al.
Published: (2026)
PlumberNet: Fixing interference leakage after GEV beamforming
by: Grondin, François, et al.
Published: (2023)
by: Grondin, François, et al.
Published: (2023)
Auditory Intelligence: Understanding the World Through Sound
by: Nam, Hyeonuk
Published: (2025)
by: Nam, Hyeonuk
Published: (2025)
Moravec's Paradox: Towards an Auditory Turing Test
by: Noever, David, et al.
Published: (2025)
by: Noever, David, et al.
Published: (2025)
APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech
by: Lian, Zhicheng, et al.
Published: (2025)
by: Lian, Zhicheng, et al.
Published: (2025)
The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMS
by: Carone, Brandon James, et al.
Published: (2025)
by: Carone, Brandon James, et al.
Published: (2025)
A General Close-loop Predictive Coding Framework for Auditory Working Memory
by: Yuan, Zhongju, et al.
Published: (2025)
by: Yuan, Zhongju, et al.
Published: (2025)
Scaling Auditory Cognition via Test-Time Compute in Audio Language Models
by: Dang, Ting, et al.
Published: (2025)
by: Dang, Ting, et al.
Published: (2025)
AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
by: Jiang, Xilin, et al.
Published: (2025)
by: Jiang, Xilin, et al.
Published: (2025)
SWIM: Short-Window CNN Integrated with Mamba for EEG-Based Auditory Spatial Attention Decoding
by: Zhang, Ziyang, et al.
Published: (2024)
by: Zhang, Ziyang, et al.
Published: (2024)
DeepASA: An Object-Oriented Multi-Purpose Network for Auditory Scene Analysis
by: Lee, Dongheon, et al.
Published: (2025)
by: Lee, Dongheon, et al.
Published: (2025)
AudioScene: Integrating Object-Event Audio into 3D Scenes
by: Yuan, Shuaihang, et al.
Published: (2025)
by: Yuan, Shuaihang, et al.
Published: (2025)
CoComposer: LLM Multi-agent Collaborative Music Composition
by: Xing, Peiwen, et al.
Published: (2025)
by: Xing, Peiwen, et al.
Published: (2025)
Sound Scene Synthesis at the DCASE 2024 Challenge
by: Lagrange, Mathieu, et al.
Published: (2025)
by: Lagrange, Mathieu, et al.
Published: (2025)
Toward a Realistic Encoding Model of Auditory Affective Understanding in the Brain
by: Pan, Guandong, et al.
Published: (2025)
by: Pan, Guandong, et al.
Published: (2025)
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2025)
by: Yang, Chih-Kai, et al.
Published: (2025)
DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection
by: Yan, Sheng, et al.
Published: (2024)
by: Yan, Sheng, et al.
Published: (2024)
Deep Space Separable Distillation for Lightweight Acoustic Scene Classification
by: Ye, ShuQi, et al.
Published: (2024)
by: Ye, ShuQi, et al.
Published: (2024)
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2025)
by: Yang, Chih-Kai, et al.
Published: (2025)
RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker Scenarios
by: Shao, Yiwen, et al.
Published: (2023)
by: Shao, Yiwen, et al.
Published: (2023)
SoundCompass: Navigating Target Sound Extraction With Effective Directional Clue Integration In Complex Acoustic Scenes
by: Choi, Dayun, et al.
Published: (2025)
by: Choi, Dayun, et al.
Published: (2025)
Safe Guard: an LLM-agent for Real-time Voice-based Hate Speech Detection in Social Virtual Reality
by: Xu, Yiwen, et al.
Published: (2024)
by: Xu, Yiwen, et al.
Published: (2024)
Beyond Discrete Categories: Multi-Task Valence-Arousal Modeling for Pet Vocalization Analysis
by: Huang, Junyao, et al.
Published: (2025)
by: Huang, Junyao, et al.
Published: (2025)
Enhancing Music Genre Classification through Multi-Algorithm Analysis and User-Friendly Visualization
by: Kamuni, Navin, et al.
Published: (2024)
by: Kamuni, Navin, et al.
Published: (2024)
Revival with Voice: Multi-modal Controllable Text-to-Speech Synthesis
by: Kim, Minsu, et al.
Published: (2025)
by: Kim, Minsu, et al.
Published: (2025)
Advancing Multi-talker ASR Performance with Large Language Models
by: Shi, Mohan, et al.
Published: (2024)
by: Shi, Mohan, et al.
Published: (2024)
Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data
by: Postma, Emmy, et al.
Published: (2025)
by: Postma, Emmy, et al.
Published: (2025)
Fitting Auditory Filterbanks with Multiresolution Neural Networks
by: Lostanlen, Vincent, et al.
Published: (2023)
by: Lostanlen, Vincent, et al.
Published: (2023)
Leveraging Spatial Cues from Cochlear Implant Microphones to Efficiently Enhance Speech Separation in Real-World Listening Scenes
by: Olalere, Feyisayo, et al.
Published: (2025)
by: Olalere, Feyisayo, et al.
Published: (2025)
Explainable Deep Learning Analysis for Raga Identification in Indian Art Music
by: Singh, Parampreet, et al.
Published: (2024)
by: Singh, Parampreet, et al.
Published: (2024)
Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
by: Xie, Jiamin, et al.
Published: (2025)
by: Xie, Jiamin, et al.
Published: (2025)
Dual-branch Graph Domain Adaptation for Cross-scenario Multi-modal Emotion Recognition
by: Shou, Yuntao, et al.
Published: (2026)
by: Shou, Yuntao, et al.
Published: (2026)
VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders
by: Cao, Yubing, et al.
Published: (2024)
by: Cao, Yubing, et al.
Published: (2024)
A Multi-modal Approach to Dysarthria Detection and Severity Assessment Using Speech and Text Information
by: M, Anuprabha, et al.
Published: (2024)
by: M, Anuprabha, et al.
Published: (2024)
Layer-aware TDNN: Speaker Recognition Using Multi-Layer Features from Pre-Trained Models
by: Kim, Jin Sob, et al.
Published: (2024)
by: Kim, Jin Sob, et al.
Published: (2024)
BSS-CFFMA: Cross-Domain Feature Fusion and Multi-Attention Speech Enhancement Network based on Self-Supervised Embedding
by: Mattursun, Alimjan, et al.
Published: (2024)
by: Mattursun, Alimjan, et al.
Published: (2024)
IS${}^3$ : Generic Impulsive--Stationary Sound Separation in Acoustic Scenes using Deep Filtering
by: Berger, Clémentine, et al.
Published: (2025)
by: Berger, Clémentine, et al.
Published: (2025)
Enhancing GOP in CTC-Based Mispronunciation Detection with Phonological Knowledge
by: Parikh, Aditya Kamlesh, et al.
Published: (2025)
by: Parikh, Aditya Kamlesh, et al.
Published: (2025)
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech
by: Bak, Taejun, et al.
Published: (2024)
by: Bak, Taejun, et al.
Published: (2024)
Similar Items
-
Direction of Arrival Correction through Speech Quality Feedback
by: Rascon, Caleb
Published: (2024) -
Scattering Transform for Auditory Attention Decoding
by: Pallenberg, René, et al.
Published: (2026) -
PlumberNet: Fixing interference leakage after GEV beamforming
by: Grondin, François, et al.
Published: (2023) -
Auditory Intelligence: Understanding the World Through Sound
by: Nam, Hyeonuk
Published: (2025) -
Moravec's Paradox: Towards an Auditory Turing Test
by: Noever, David, et al.
Published: (2025)