:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Xuanhao, Li, Chang
Format:	Preprint
Published:	2026
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.04547
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Prompt-aware classifier free guidance for diffusion models
by: Zhang, Xuanhao, et al.
Published: (2025)

Making deep neural networks work for medical audio: representation, compression and domain adaptation
by: Onu, Charles C
Published: (2025)

Keep what you need : extracting efficient subnetworks from large audio representation models
by: Genova, David, et al.
Published: (2025)

SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation
by: Muna, Ummy Maria, et al.
Published: (2025)

Sustaining model performance for covid-19 detection from dynamic audio data: Development and evaluation of a comprehensive drift-adaptive framework
by: Ganitidis, Theofanis, et al.
Published: (2024)

Mellow: a small audio language model for reasoning
by: Deshmukh, Soham, et al.
Published: (2025)

Multi-layer attentive probing improves transfer of audio representations for bioacoustics
by: Miron, Marius, et al.
Published: (2026)

Where are we in audio deepfake detection? A systematic analysis over generative and detection models
by: Li, Xiang, et al.
Published: (2024)

Switchcodec: Adaptive residual-expert sparse quantization for high-fidelity neural audio coding
by: Wang, Xiangbo, et al.
Published: (2026)

GRAM: Spatial general-purpose audio representation models for real-world applications
by: Yuksel, Goksenin, et al.
Published: (2025)

DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
by: Zhao, Lei, et al.
Published: (2025)

ADIFF: Explaining audio difference using natural language
by: Deshmukh, Soham, et al.
Published: (2025)

AudioMAE++: learning better masked audio representations with SwiGLU FFNs
by: Yadav, Sarthak, et al.
Published: (2025)

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
by: Liu, Zihan, et al.
Published: (2025)

Learning to reconstruct from saturated data: audio declipping and high-dynamic range imaging
by: Sechaud, Victor, et al.
Published: (2026)

An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
by: Yadav, Sarthak, et al.
Published: (2025)

AISTAT lab system for DCASE2025 Task6: Language-based audio retrieval
by: Kim, Hyun Jun, et al.
Published: (2025)

Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes
by: Roman, Adrian S., et al.
Published: (2024)

Recomposer: Event-roll-guided generative audio editing
by: Ellis, Daniel P. W., et al.
Published: (2025)

DAST: A Dual-Stream Voice Anonymization Attacker with Staged Training
by: Arefeen, Ridwan, et al.
Published: (2026)

Forensic deepfake audio detection using segmental speech features
by: Yang, Tianle, et al.
Published: (2025)

NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
by: Robinson, David, et al.
Published: (2024)

Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering
by: Zhao, Jinghua, et al.
Published: (2025)

A sound description: Exploring prompt templates and class descriptions to enhance zero-shot audio classification
by: Olvera, Michel, et al.
Published: (2024)

Exploring bat song syllable representations in self-supervised audio encoders
by: Kloots, Marianne de Heer, et al.
Published: (2024)

Mask2Flow-TSE: Two-Stage Target Speaker Extraction with Masking and Flow Matching
by: Moon, Junwon, et al.
Published: (2026)

Improving Anomalous Sound Detection with Attribute-aware Representation from Domain-adaptive Pre-training
by: Fang, Xin, et al.
Published: (2025)

Joint sentiment analysis of lyrics and audio in music
by: Schaab, Lea, et al.
Published: (2024)

MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2026)

Stage-Adaptive Reliability Modeling for Continuous Valence-Arousal Estimation
by: Lee, Yubeen, et al.
Published: (2026)

Discriminating real and synthetic super-resolved audio samples using embedding-based classifiers
by: Silaev, Mikhail, et al.
Published: (2026)

Discriminant audio properties in deep learning based respiratory insufficiency detection in Brazilian Portuguese
by: Gauy, Marcelo Matheus, et al.
Published: (2024)

Supervised contrastive learning from weakly-labeled audio segments for musical version matching
by: Serrà, Joan, et al.
Published: (2025)

AudioMoG: Guiding Audio Generation with Mixture-of-Guidance
by: Wang, Junyou, et al.
Published: (2025)

End-to-end audio-visual learning for cochlear implant sound coding simulations in noisy environments
by: Lin, Meng-Ping, et al.
Published: (2025)

A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement
by: Lu, Shenghui, et al.
Published: (2025)

Eliminating stability hallucinations in llm-based tts models via attention guidance
by: Wang, ShiMing, et al.
Published: (2025)

Rebellion: Noise-Robust Reasoning Training for Audio Reasoning Models
by: Huang, Tiansheng, et al.
Published: (2025)

Hardware-accelerated graph neural networks: an alternative approach for neuromorphic event-based audio classification and keyword spotting on SoC FPGA
by: Jeziorek, Kamil, et al.
Published: (2026)

Stage-Wise and Prior-Aware Neural Speech Phase Prediction
by: Liu, Fei, et al.
Published: (2024)