:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Xiangbo, Jiang, Wenbin, Wang, Jin, You, Yubo, Fang, Sheng, Wen, Fei
Format:	Preprint
Published:	2026
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.20362
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization
by: Wang, Jin, et al.
Published: (2025)

TQCodec: Towards neural audio codec for high-fidelity music streaming
by: He, Lixing, et al.
Published: (2026)

Stage-adaptive audio diffusion modeling
by: Zhang, Xuanhao, et al.
Published: (2026)

Making deep neural networks work for medical audio: representation, compression and domain adaptation
by: Onu, Charles C
Published: (2025)

An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
by: Yadav, Sarthak, et al.
Published: (2025)

Multi-layer attentive probing improves transfer of audio representations for bioacoustics
by: Miron, Marius, et al.
Published: (2026)

Learning to reconstruct from saturated data: audio declipping and high-dynamic range imaging
by: Sechaud, Victor, et al.
Published: (2026)

USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
by: Wang, Wenbin, et al.
Published: (2024)

Keep what you need : extracting efficient subnetworks from large audio representation models
by: Genova, David, et al.
Published: (2025)

SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation
by: Muna, Ummy Maria, et al.
Published: (2025)

ADIFF: Explaining audio difference using natural language
by: Deshmukh, Soham, et al.
Published: (2025)

Mellow: a small audio language model for reasoning
by: Deshmukh, Soham, et al.
Published: (2025)

End-to-end audio-visual learning for cochlear implant sound coding simulations in noisy environments
by: Lin, Meng-Ping, et al.
Published: (2025)

Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering
by: Zhao, Jinghua, et al.
Published: (2025)

MBCodec:Thorough disentangle for high-fidelity audio compression
by: Zhang, Ruonan, et al.
Published: (2025)

SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture
by: Sui, Kehan, et al.
Published: (2025)

Hardware-accelerated graph neural networks: an alternative approach for neuromorphic event-based audio classification and keyword spotting on SoC FPGA
by: Jeziorek, Kamil, et al.
Published: (2026)

RAS: a Reliability Oriented Metric for Automatic Speech Recognition
by: Huang, Wenbin, et al.
Published: (2026)

Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning
by: Yang, Dongchao, et al.
Published: (2025)

AudioMAE++: learning better masked audio representations with SwiGLU FFNs
by: Yadav, Sarthak, et al.
Published: (2025)

GRAM: Spatial general-purpose audio representation models for real-world applications
by: Yuksel, Goksenin, et al.
Published: (2025)

Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
by: Nasr, Seham, et al.
Published: (2025)

AISTAT lab system for DCASE2025 Task6: Language-based audio retrieval
by: Kim, Hyun Jun, et al.
Published: (2025)

Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes
by: Roman, Adrian S., et al.
Published: (2024)

Recomposer: Event-roll-guided generative audio editing
by: Ellis, Daniel P. W., et al.
Published: (2025)

AST: Adaptive, Seamless, and Training-Free Precise Speech Editing
by: Lv, Sihan, et al.
Published: (2026)

Where are we in audio deepfake detection? A systematic analysis over generative and detection models
by: Li, Xiang, et al.
Published: (2024)

DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
by: Zhao, Lei, et al.
Published: (2025)

Forensic deepfake audio detection using segmental speech features
by: Yang, Tianle, et al.
Published: (2025)

NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
by: Robinson, David, et al.
Published: (2024)

A sound description: Exploring prompt templates and class descriptions to enhance zero-shot audio classification
by: Olvera, Michel, et al.
Published: (2024)

Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition
by: Chen, Jinming, et al.
Published: (2024)

Keyword spotting using convolutional neural network for speech recognition in Hindi
by: Bharti, Saru, et al.
Published: (2026)

Exploring bat song syllable representations in self-supervised audio encoders
by: Kloots, Marianne de Heer, et al.
Published: (2024)

TimberAgent: Gram-Guided Retrieval for Executable Music Effect Control
by: He, Shihao, et al.
Published: (2026)

Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention
by: Wang, Cong, et al.
Published: (2025)

A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model
by: Zhao, Dongdi, et al.
Published: (2024)

Joint sentiment analysis of lyrics and audio in music
by: Schaab, Lea, et al.
Published: (2024)

Adaptive Accompaniment with ReaLchords
by: Wu, Yusong, et al.
Published: (2025)

MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2026)