:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yuan, Zhongju, Wiggins, Geraint, Botteldooren, Dick
Format:	Preprint
Published:	2026
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.13651
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A General Close-loop Predictive Coding Framework for Auditory Working Memory
by: Yuan, Zhongju, et al.
Published: (2025)

BioOSS: A Bio-Inspired Oscillatory State System with Spatio-Temporal Dynamics
by: Yuan, Zhongju, et al.
Published: (2025)

A novel Reservoir Architecture for Periodic Time Series Prediction
by: Yuan, Zhongju, et al.
Published: (2024)

A Reservoir-based Model for Human-like Perception of Complex Rhythm Pattern
by: Yuan, Zhongju, et al.
Published: (2025)

A Dynamic Systems Approach to Modelling Human-Machine Rhythm Interaction
by: Yuan, Zhongju, et al.
Published: (2024)

Yin-Yang: Developing Motifs With Long-Term Structure And Controllability
by: Bhandari, Keshav, et al.
Published: (2025)

Tidal MerzA: Combining affective modelling and autonomous code generation through Reinforcement Learning
by: Wilson, Elizabeth, et al.
Published: (2024)

MIDI-VALLE: Improving Expressive Piano Performance Synthesis Through Neural Codec Language Modelling
by: Tang, Jingjing, et al.
Published: (2025)

NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention
by: De Silva, Dashanka, et al.
Published: (2024)

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
by: Jiang, Xilin, et al.
Published: (2025)

Scaling Auditory Cognition via Test-Time Compute in Audio Language Models
by: Dang, Ting, et al.
Published: (2025)

DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection
by: Yan, Sheng, et al.
Published: (2024)

Fusing Memory and Attention: A study on LSTM, Transformer and Hybrid Architectures for Symbolic Music Generation
by: Ghoshal, Soudeep, et al.
Published: (2026)

DRASP: A Dual-Resolution Attentive Statistics Pooling Framework for Automatic MOS Prediction
by: Yang, Cheng-Yeh, et al.
Published: (2025)

AudioMotionBench: Evaluating Auditory Motion Perception in Audio LLMs
by: Sun, Zhe, et al.
Published: (2025)

Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception
by: Xie, Yuankun, et al.
Published: (2025)

Parallel Delayed Memory Units for Enhanced Temporal Modeling in Biomedical and Bioacoustic Signal Analysis
by: Sun, Pengfei, et al.
Published: (2025)

Scattering Transformer: A Training-Free Transformer Architecture for Heart Murmur Detection
by: Zewail, Rami
Published: (2025)

AuditoryBench++: Can Language Models Understand Auditory Knowledge without Hearing?
by: Ok, Hyunjong, et al.
Published: (2025)

DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs
by: Papi, Sara, et al.
Published: (2026)

LoopGen: Training-Free Loopable Music Generation
by: Marincione, Davide, et al.
Published: (2025)

AST: Adaptive, Seamless, and Training-Free Precise Speech Editing
by: Lv, Sihan, et al.
Published: (2026)

Auditory Intelligence: Understanding the World Through Sound
by: Nam, Hyeonuk
Published: (2025)

Moravec's Paradox: Towards an Auditory Turing Test
by: Noever, David, et al.
Published: (2025)

SWIM: Short-Window CNN Integrated with Mamba for EEG-Based Auditory Spatial Attention Decoding
by: Zhang, Ziyang, et al.
Published: (2024)

ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings
by: Mariotte, Theo, et al.
Published: (2024)

Temporal Contrastive Decoding: A Training-Free Method for Large Audio-Language Models
by: Li, Yanda, et al.
Published: (2026)

End-to-end Topographic Auditory Models Replicate Signatures of Human Auditory Cortex
by: Al-Tahan, Haider, et al.
Published: (2025)

Flamed-TTS: Flow Matching Attention-Free Models for Efficient Generating and Dynamic Pacing Zero-shot Text-to-Speech
by: Huynh-Nguyen, Hieu-Nghia, et al.
Published: (2025)

AUREXA-SE: Audio-Visual Unified Representation Exchange Architecture with Cross-Attention and Squeezeformer for Speech Enhancement
by: Sajid, M., et al.
Published: (2025)

APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech
by: Lian, Zhicheng, et al.
Published: (2025)

Neuro-MSBG: An End-to-End Neural Model for Hearing Loss Simulation
by: Yuan, Hui-Guan, et al.
Published: (2025)

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection
by: Chen, Meng, et al.
Published: (2026)

TFGA-Net: Temporal-Frequency Graph Attention Network for Brain-Controlled Speaker Extraction
by: Si, Youhao, et al.
Published: (2025)

The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMS
by: Carone, Brandon James, et al.
Published: (2025)

Speech Emotion Recognition Leveraging OpenAI's Whisper Representations and Attentive Pooling Methods
by: Shendabadi, Ali, et al.
Published: (2026)

Lina-Speech: Gated Linear Attention and Initial-State Tuning for Multi-Sample Prompting Text-To-Speech Synthesis
by: Lemerle, Théodor, et al.
Published: (2024)

DGFNet: End-to-End Audio-Visual Source Separation Based on Dynamic Gating Fusion
by: Yu, Yinfeng, et al.
Published: (2025)

Automatic Speech Recognition in the Modern Era: Architectures, Training, and Evaluation
by: Nayeem, Md., et al.
Published: (2025)

Single-word Auditory Attention Decoding Using Deep Learning Model
by: Nguyen, Nhan Duc Thanh, et al.
Published: (2024)