:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Stylianou, Ioannis, Francombe, Jon, Martinez-Nuevo, Pablo, Shepstone, Sven Ewan, Tan, Zheng-Hua
Format:	Preprint
Published:	2026
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.09448
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LibriVAD: A Scalable Open Dataset with Deep Learning Benchmarks for Voice Activity Detection
by: Stylianou, Ioannis, et al.
Published: (2025)

Data Aware Differentiable Neural Architecture Search for Tiny Keyword Spotting Applications
by: Shi, Yujia, et al.
Published: (2025)

Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping
by: Khanal, Subash, et al.
Published: (2025)

BNMusic: Blending Environmental Noises into Personalized Music
by: Zuo, Chi, et al.
Published: (2025)

The Equalizer: Introducing Shape-Gain Decomposition in Neural Audio Codecs
by: Sadok, Samir, et al.
Published: (2026)

DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models
by: Wilkinghoff, Kevin, et al.
Published: (2025)

Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering
by: Glazer, Neta, et al.
Published: (2026)

Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition
by: Ginjala, Srishti, et al.
Published: (2026)

Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
by: Yadav, Sarthak, et al.
Published: (2024)

Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
by: Xiong, Chenxu, et al.
Published: (2024)

Mind the Prompt: Prompting Strategies in Audio Generations for Improving Sound Classification
by: Ronchini, Francesca, et al.
Published: (2025)

AudioMAE++: learning better masked audio representations with SwiGLU FFNs
by: Yadav, Sarthak, et al.
Published: (2025)

An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
by: Yadav, Sarthak, et al.
Published: (2025)

Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels
by: Weng, Yuzhe, et al.
Published: (2026)

Joint Minimum Processing Beamforming and Near-end Listening Enhancement
by: Fuglsig, Andreas J., et al.
Published: (2023)

Listen Like a Teacher: Mitigating Whisper Hallucinations using Adaptive Layer Attention and Knowledge Distillation
by: Tripathi, Kumud, et al.
Published: (2025)

TopSeg: A Multi-Scale Topological Framework for Data-Efficient Heart Sound Segmentation
by: Zhang, Peihong, et al.
Published: (2025)

Learning When to Think While Listening in Large Audio-Language Models
by: Song, Zhiyuan, et al.
Published: (2026)

From Sound to Setting: AI-Based Equalizer Parameter Prediction for Piano Tone Replication
by: Yu, Song-Ze
Published: (2025)

'Studies for': A Human-AI Co-Creative Sound Artwork Using a Real-time Multi-channel Sound Generation Model
by: Nagashima, Chihiro, et al.
Published: (2025)

SpeechQualityLLM: LLM-Based Multimodal Assessment of Speech Quality
by: Monjur, Mahathir, et al.
Published: (2025)

How Much Does Machine Identity Matter in Anomalous Sound Detection at Test Time?
by: Wilkinghoff, Kevin, et al.
Published: (2026)

AudioCapBench: Quick Evaluation on Audio Captioning across Sound, Music, and Speech
by: Qiu, Jielin, et al.
Published: (2026)

Mind the Gap: Detecting Cluster Exits for Robust Local Density-Based Score Normalization in Anomalous Sound Detection
by: Wilkinghoff, Kevin, et al.
Published: (2026)

MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement
by: Kühne, Nikolai Lund, et al.
Published: (2025)

Exploring Resolution-Wise Shared Attention in Hybrid Mamba-U-Nets for Improved Cross-Corpus Speech Enhancement
by: Kühne, Nikolai Lund, et al.
Published: (2025)

xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement
by: Kühne, Nikolai Lund, et al.
Published: (2025)

Voice Attribute Editing with Text Prompt
by: Sheng, Zhengyan, et al.
Published: (2024)

Temporal Pooling Strategies for Training-Free Anomalous Sound Detection with Self-Supervised Audio Embeddings
by: Wilkinghoff, Kevin, et al.
Published: (2026)

Woosh: A Sound Effects Foundation Model
by: Hadjeres, Gaëtan, et al.
Published: (2026)

Language Model Can Listen While Speaking
by: Ma, Ziyang, et al.
Published: (2024)

Pediatric Asthma Detection with Googles HeAR Model: An AI-Driven Respiratory Sound Classifier
by: Ehtesham, Abul, et al.
Published: (2025)

Listen to Extract: Onset-Prompted Target Speaker Extraction
by: Shen, Pengjie, et al.
Published: (2025)

Towards Open World Sound Event Detection
by: Hai, P. H., et al.
Published: (2026)

Bird detection in audio: a survey and a challenge
by: Stowell, Dan, et al.
Published: (2016)

SoundSignature: What Type of Music Do You Like?
by: Carone, Brandon James, et al.
Published: (2024)

CycleGuardian: A Framework for Automatic RespiratorySound classification Based on Improved Deep clustering and Contrastive Learning
by: Chu, Yun, et al.
Published: (2025)

Vocal Tract Length Warped Features for Spoken Keyword Spotting
by: Sarkar, Achintya kr., et al.
Published: (2025)

Learning How to Listen: A Temporal-Frequential Attention Model for Sound Event Detection
by: Shen, Yu-Han, et al.
Published: (2018)

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
by: Du, Zhihao, et al.
Published: (2023)