Saved in:
Bibliographic Details
Main Authors: Yuan, Zhongju, Wiggins, Geraint, Botteldooren, Dick
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.13651
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911681078099968
author Yuan, Zhongju
Wiggins, Geraint
Botteldooren, Dick
author_facet Yuan, Zhongju
Wiggins, Geraint
Botteldooren, Dick
contents Audio provides critical situational cues, yet current Audio Language Models (ALMs) face an attention bottleneck in long-form recordings where dominant background patterns can dilute rare, salient events. We introduce NAACA, a training-free NeuroAuditory Attentive Cognitive Architecture that reframes attention allocation as an auditory salience filtering problem. At its core is OWM, a neuro-inspired Oscillatory Working Memory that maintains stable attractor-like states and triggers higher-cognition ALM processing only when adaptive energy fluctuations signal perceptual salience, triggering higher-level reasoning. On XD-Violence, NAACA improves AudioQwen's average precision (AP) from 53.50% to 70.60% while reducing unnecessary ALM invocations. Furthermore, qualitative case studies on the Urban Soundscapes of the World (USoW) dataset show that OWM captures novel events and subcategory shifts while remaining robust to transient pauses and ambient urban noise.
format Preprint
id arxiv_https___arxiv_org_abs_2605_13651
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle NAACA: Training-Free NeuroAuditory Attentive Cognitive Architecture with Oscillatory Working Memory for Salience-Driven Attention Gating
Yuan, Zhongju
Wiggins, Geraint
Botteldooren, Dick
Sound
Artificial Intelligence
Audio provides critical situational cues, yet current Audio Language Models (ALMs) face an attention bottleneck in long-form recordings where dominant background patterns can dilute rare, salient events. We introduce NAACA, a training-free NeuroAuditory Attentive Cognitive Architecture that reframes attention allocation as an auditory salience filtering problem. At its core is OWM, a neuro-inspired Oscillatory Working Memory that maintains stable attractor-like states and triggers higher-cognition ALM processing only when adaptive energy fluctuations signal perceptual salience, triggering higher-level reasoning. On XD-Violence, NAACA improves AudioQwen's average precision (AP) from 53.50% to 70.60% while reducing unnecessary ALM invocations. Furthermore, qualitative case studies on the Urban Soundscapes of the World (USoW) dataset show that OWM captures novel events and subcategory shifts while remaining robust to transient pauses and ambient urban noise.
title NAACA: Training-Free NeuroAuditory Attentive Cognitive Architecture with Oscillatory Working Memory for Salience-Driven Attention Gating
topic Sound
Artificial Intelligence
url https://arxiv.org/abs/2605.13651