Saved in:
| Main Authors: | Stylianou, Ioannis, Francombe, Jon, Martinez-Nuevo, Pablo, Shepstone, Sven Ewan, Tan, Zheng-Hua |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.09448 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LibriVAD: A Scalable Open Dataset with Deep Learning Benchmarks for Voice Activity Detection
by: Stylianou, Ioannis, et al.
Published: (2025)
by: Stylianou, Ioannis, et al.
Published: (2025)
Data Aware Differentiable Neural Architecture Search for Tiny Keyword Spotting Applications
by: Shi, Yujia, et al.
Published: (2025)
by: Shi, Yujia, et al.
Published: (2025)
Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping
by: Khanal, Subash, et al.
Published: (2025)
by: Khanal, Subash, et al.
Published: (2025)
BNMusic: Blending Environmental Noises into Personalized Music
by: Zuo, Chi, et al.
Published: (2025)
by: Zuo, Chi, et al.
Published: (2025)
The Equalizer: Introducing Shape-Gain Decomposition in Neural Audio Codecs
by: Sadok, Samir, et al.
Published: (2026)
by: Sadok, Samir, et al.
Published: (2026)
DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models
by: Wilkinghoff, Kevin, et al.
Published: (2025)
by: Wilkinghoff, Kevin, et al.
Published: (2025)
Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering
by: Glazer, Neta, et al.
Published: (2026)
by: Glazer, Neta, et al.
Published: (2026)
Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition
by: Ginjala, Srishti, et al.
Published: (2026)
by: Ginjala, Srishti, et al.
Published: (2026)
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
by: Yadav, Sarthak, et al.
Published: (2024)
by: Yadav, Sarthak, et al.
Published: (2024)
Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
by: Xiong, Chenxu, et al.
Published: (2024)
by: Xiong, Chenxu, et al.
Published: (2024)
Mind the Prompt: Prompting Strategies in Audio Generations for Improving Sound Classification
by: Ronchini, Francesca, et al.
Published: (2025)
by: Ronchini, Francesca, et al.
Published: (2025)
AudioMAE++: learning better masked audio representations with SwiGLU FFNs
by: Yadav, Sarthak, et al.
Published: (2025)
by: Yadav, Sarthak, et al.
Published: (2025)
An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
by: Yadav, Sarthak, et al.
Published: (2025)
by: Yadav, Sarthak, et al.
Published: (2025)
Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels
by: Weng, Yuzhe, et al.
Published: (2026)
by: Weng, Yuzhe, et al.
Published: (2026)
Joint Minimum Processing Beamforming and Near-end Listening Enhancement
by: Fuglsig, Andreas J., et al.
Published: (2023)
by: Fuglsig, Andreas J., et al.
Published: (2023)
Listen Like a Teacher: Mitigating Whisper Hallucinations using Adaptive Layer Attention and Knowledge Distillation
by: Tripathi, Kumud, et al.
Published: (2025)
by: Tripathi, Kumud, et al.
Published: (2025)
TopSeg: A Multi-Scale Topological Framework for Data-Efficient Heart Sound Segmentation
by: Zhang, Peihong, et al.
Published: (2025)
by: Zhang, Peihong, et al.
Published: (2025)
Learning When to Think While Listening in Large Audio-Language Models
by: Song, Zhiyuan, et al.
Published: (2026)
by: Song, Zhiyuan, et al.
Published: (2026)
From Sound to Setting: AI-Based Equalizer Parameter Prediction for Piano Tone Replication
by: Yu, Song-Ze
Published: (2025)
by: Yu, Song-Ze
Published: (2025)
'Studies for': A Human-AI Co-Creative Sound Artwork Using a Real-time Multi-channel Sound Generation Model
by: Nagashima, Chihiro, et al.
Published: (2025)
by: Nagashima, Chihiro, et al.
Published: (2025)
SpeechQualityLLM: LLM-Based Multimodal Assessment of Speech Quality
by: Monjur, Mahathir, et al.
Published: (2025)
by: Monjur, Mahathir, et al.
Published: (2025)
How Much Does Machine Identity Matter in Anomalous Sound Detection at Test Time?
by: Wilkinghoff, Kevin, et al.
Published: (2026)
by: Wilkinghoff, Kevin, et al.
Published: (2026)
AudioCapBench: Quick Evaluation on Audio Captioning across Sound, Music, and Speech
by: Qiu, Jielin, et al.
Published: (2026)
by: Qiu, Jielin, et al.
Published: (2026)
Mind the Gap: Detecting Cluster Exits for Robust Local Density-Based Score Normalization in Anomalous Sound Detection
by: Wilkinghoff, Kevin, et al.
Published: (2026)
by: Wilkinghoff, Kevin, et al.
Published: (2026)
MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement
by: Kühne, Nikolai Lund, et al.
Published: (2025)
by: Kühne, Nikolai Lund, et al.
Published: (2025)
Exploring Resolution-Wise Shared Attention in Hybrid Mamba-U-Nets for Improved Cross-Corpus Speech Enhancement
by: Kühne, Nikolai Lund, et al.
Published: (2025)
by: Kühne, Nikolai Lund, et al.
Published: (2025)
xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement
by: Kühne, Nikolai Lund, et al.
Published: (2025)
by: Kühne, Nikolai Lund, et al.
Published: (2025)
Voice Attribute Editing with Text Prompt
by: Sheng, Zhengyan, et al.
Published: (2024)
by: Sheng, Zhengyan, et al.
Published: (2024)
Temporal Pooling Strategies for Training-Free Anomalous Sound Detection with Self-Supervised Audio Embeddings
by: Wilkinghoff, Kevin, et al.
Published: (2026)
by: Wilkinghoff, Kevin, et al.
Published: (2026)
Woosh: A Sound Effects Foundation Model
by: Hadjeres, Gaëtan, et al.
Published: (2026)
by: Hadjeres, Gaëtan, et al.
Published: (2026)
Language Model Can Listen While Speaking
by: Ma, Ziyang, et al.
Published: (2024)
by: Ma, Ziyang, et al.
Published: (2024)
Pediatric Asthma Detection with Googles HeAR Model: An AI-Driven Respiratory Sound Classifier
by: Ehtesham, Abul, et al.
Published: (2025)
by: Ehtesham, Abul, et al.
Published: (2025)
Listen to Extract: Onset-Prompted Target Speaker Extraction
by: Shen, Pengjie, et al.
Published: (2025)
by: Shen, Pengjie, et al.
Published: (2025)
Towards Open World Sound Event Detection
by: Hai, P. H., et al.
Published: (2026)
by: Hai, P. H., et al.
Published: (2026)
Bird detection in audio: a survey and a challenge
by: Stowell, Dan, et al.
Published: (2016)
by: Stowell, Dan, et al.
Published: (2016)
SoundSignature: What Type of Music Do You Like?
by: Carone, Brandon James, et al.
Published: (2024)
by: Carone, Brandon James, et al.
Published: (2024)
CycleGuardian: A Framework for Automatic RespiratorySound classification Based on Improved Deep clustering and Contrastive Learning
by: Chu, Yun, et al.
Published: (2025)
by: Chu, Yun, et al.
Published: (2025)
Vocal Tract Length Warped Features for Spoken Keyword Spotting
by: Sarkar, Achintya kr., et al.
Published: (2025)
by: Sarkar, Achintya kr., et al.
Published: (2025)
Learning How to Listen: A Temporal-Frequential Attention Model for Sound Event Detection
by: Shen, Yu-Han, et al.
Published: (2018)
by: Shen, Yu-Han, et al.
Published: (2018)
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
by: Du, Zhihao, et al.
Published: (2023)
by: Du, Zhihao, et al.
Published: (2023)
Similar Items
-
LibriVAD: A Scalable Open Dataset with Deep Learning Benchmarks for Voice Activity Detection
by: Stylianou, Ioannis, et al.
Published: (2025) -
Data Aware Differentiable Neural Architecture Search for Tiny Keyword Spotting Applications
by: Shi, Yujia, et al.
Published: (2025) -
Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping
by: Khanal, Subash, et al.
Published: (2025) -
BNMusic: Blending Environmental Noises into Personalized Music
by: Zuo, Chi, et al.
Published: (2025) -
The Equalizer: Introducing Shape-Gain Decomposition in Neural Audio Codecs
by: Sadok, Samir, et al.
Published: (2026)