Saved in:
| Main Authors: | Zhang, Xuanhao, Li, Chang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.04547 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Prompt-aware classifier free guidance for diffusion models
by: Zhang, Xuanhao, et al.
Published: (2025)
by: Zhang, Xuanhao, et al.
Published: (2025)
Making deep neural networks work for medical audio: representation, compression and domain adaptation
by: Onu, Charles C
Published: (2025)
by: Onu, Charles C
Published: (2025)
Keep what you need : extracting efficient subnetworks from large audio representation models
by: Genova, David, et al.
Published: (2025)
by: Genova, David, et al.
Published: (2025)
SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation
by: Muna, Ummy Maria, et al.
Published: (2025)
by: Muna, Ummy Maria, et al.
Published: (2025)
Sustaining model performance for covid-19 detection from dynamic audio data: Development and evaluation of a comprehensive drift-adaptive framework
by: Ganitidis, Theofanis, et al.
Published: (2024)
by: Ganitidis, Theofanis, et al.
Published: (2024)
Mellow: a small audio language model for reasoning
by: Deshmukh, Soham, et al.
Published: (2025)
by: Deshmukh, Soham, et al.
Published: (2025)
Multi-layer attentive probing improves transfer of audio representations for bioacoustics
by: Miron, Marius, et al.
Published: (2026)
by: Miron, Marius, et al.
Published: (2026)
Where are we in audio deepfake detection? A systematic analysis over generative and detection models
by: Li, Xiang, et al.
Published: (2024)
by: Li, Xiang, et al.
Published: (2024)
Switchcodec: Adaptive residual-expert sparse quantization for high-fidelity neural audio coding
by: Wang, Xiangbo, et al.
Published: (2026)
by: Wang, Xiangbo, et al.
Published: (2026)
GRAM: Spatial general-purpose audio representation models for real-world applications
by: Yuksel, Goksenin, et al.
Published: (2025)
by: Yuksel, Goksenin, et al.
Published: (2025)
DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
by: Zhao, Lei, et al.
Published: (2025)
by: Zhao, Lei, et al.
Published: (2025)
ADIFF: Explaining audio difference using natural language
by: Deshmukh, Soham, et al.
Published: (2025)
by: Deshmukh, Soham, et al.
Published: (2025)
AudioMAE++: learning better masked audio representations with SwiGLU FFNs
by: Yadav, Sarthak, et al.
Published: (2025)
by: Yadav, Sarthak, et al.
Published: (2025)
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
by: Liu, Zihan, et al.
Published: (2025)
by: Liu, Zihan, et al.
Published: (2025)
Learning to reconstruct from saturated data: audio declipping and high-dynamic range imaging
by: Sechaud, Victor, et al.
Published: (2026)
by: Sechaud, Victor, et al.
Published: (2026)
An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
by: Yadav, Sarthak, et al.
Published: (2025)
by: Yadav, Sarthak, et al.
Published: (2025)
AISTAT lab system for DCASE2025 Task6: Language-based audio retrieval
by: Kim, Hyun Jun, et al.
Published: (2025)
by: Kim, Hyun Jun, et al.
Published: (2025)
Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes
by: Roman, Adrian S., et al.
Published: (2024)
by: Roman, Adrian S., et al.
Published: (2024)
Recomposer: Event-roll-guided generative audio editing
by: Ellis, Daniel P. W., et al.
Published: (2025)
by: Ellis, Daniel P. W., et al.
Published: (2025)
DAST: A Dual-Stream Voice Anonymization Attacker with Staged Training
by: Arefeen, Ridwan, et al.
Published: (2026)
by: Arefeen, Ridwan, et al.
Published: (2026)
Forensic deepfake audio detection using segmental speech features
by: Yang, Tianle, et al.
Published: (2025)
by: Yang, Tianle, et al.
Published: (2025)
NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
by: Robinson, David, et al.
Published: (2024)
by: Robinson, David, et al.
Published: (2024)
Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering
by: Zhao, Jinghua, et al.
Published: (2025)
by: Zhao, Jinghua, et al.
Published: (2025)
A sound description: Exploring prompt templates and class descriptions to enhance zero-shot audio classification
by: Olvera, Michel, et al.
Published: (2024)
by: Olvera, Michel, et al.
Published: (2024)
Exploring bat song syllable representations in self-supervised audio encoders
by: Kloots, Marianne de Heer, et al.
Published: (2024)
by: Kloots, Marianne de Heer, et al.
Published: (2024)
Mask2Flow-TSE: Two-Stage Target Speaker Extraction with Masking and Flow Matching
by: Moon, Junwon, et al.
Published: (2026)
by: Moon, Junwon, et al.
Published: (2026)
Improving Anomalous Sound Detection with Attribute-aware Representation from Domain-adaptive Pre-training
by: Fang, Xin, et al.
Published: (2025)
by: Fang, Xin, et al.
Published: (2025)
Joint sentiment analysis of lyrics and audio in music
by: Schaab, Lea, et al.
Published: (2024)
by: Schaab, Lea, et al.
Published: (2024)
MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2026)
by: Yang, Chih-Kai, et al.
Published: (2026)
Stage-Adaptive Reliability Modeling for Continuous Valence-Arousal Estimation
by: Lee, Yubeen, et al.
Published: (2026)
by: Lee, Yubeen, et al.
Published: (2026)
Discriminating real and synthetic super-resolved audio samples using embedding-based classifiers
by: Silaev, Mikhail, et al.
Published: (2026)
by: Silaev, Mikhail, et al.
Published: (2026)
Discriminant audio properties in deep learning based respiratory insufficiency detection in Brazilian Portuguese
by: Gauy, Marcelo Matheus, et al.
Published: (2024)
by: Gauy, Marcelo Matheus, et al.
Published: (2024)
Supervised contrastive learning from weakly-labeled audio segments for musical version matching
by: Serrà, Joan, et al.
Published: (2025)
by: Serrà, Joan, et al.
Published: (2025)
AudioMoG: Guiding Audio Generation with Mixture-of-Guidance
by: Wang, Junyou, et al.
Published: (2025)
by: Wang, Junyou, et al.
Published: (2025)
End-to-end audio-visual learning for cochlear implant sound coding simulations in noisy environments
by: Lin, Meng-Ping, et al.
Published: (2025)
by: Lin, Meng-Ping, et al.
Published: (2025)
A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement
by: Lu, Shenghui, et al.
Published: (2025)
by: Lu, Shenghui, et al.
Published: (2025)
Eliminating stability hallucinations in llm-based tts models via attention guidance
by: Wang, ShiMing, et al.
Published: (2025)
by: Wang, ShiMing, et al.
Published: (2025)
Rebellion: Noise-Robust Reasoning Training for Audio Reasoning Models
by: Huang, Tiansheng, et al.
Published: (2025)
by: Huang, Tiansheng, et al.
Published: (2025)
Hardware-accelerated graph neural networks: an alternative approach for neuromorphic event-based audio classification and keyword spotting on SoC FPGA
by: Jeziorek, Kamil, et al.
Published: (2026)
by: Jeziorek, Kamil, et al.
Published: (2026)
Stage-Wise and Prior-Aware Neural Speech Phase Prediction
by: Liu, Fei, et al.
Published: (2024)
by: Liu, Fei, et al.
Published: (2024)
Similar Items
-
Prompt-aware classifier free guidance for diffusion models
by: Zhang, Xuanhao, et al.
Published: (2025) -
Making deep neural networks work for medical audio: representation, compression and domain adaptation
by: Onu, Charles C
Published: (2025) -
Keep what you need : extracting efficient subnetworks from large audio representation models
by: Genova, David, et al.
Published: (2025) -
SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation
by: Muna, Ummy Maria, et al.
Published: (2025) -
Sustaining model performance for covid-19 detection from dynamic audio data: Development and evaluation of a comprehensive drift-adaptive framework
by: Ganitidis, Theofanis, et al.
Published: (2024)