Saved in:
| Main Authors: | Wang, Xiangbo, Jiang, Wenbin, Wang, Jin, You, Yubo, Fang, Sheng, Wen, Fei |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.20362 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization
by: Wang, Jin, et al.
Published: (2025)
by: Wang, Jin, et al.
Published: (2025)
TQCodec: Towards neural audio codec for high-fidelity music streaming
by: He, Lixing, et al.
Published: (2026)
by: He, Lixing, et al.
Published: (2026)
Stage-adaptive audio diffusion modeling
by: Zhang, Xuanhao, et al.
Published: (2026)
by: Zhang, Xuanhao, et al.
Published: (2026)
Making deep neural networks work for medical audio: representation, compression and domain adaptation
by: Onu, Charles C
Published: (2025)
by: Onu, Charles C
Published: (2025)
An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
by: Yadav, Sarthak, et al.
Published: (2025)
by: Yadav, Sarthak, et al.
Published: (2025)
Multi-layer attentive probing improves transfer of audio representations for bioacoustics
by: Miron, Marius, et al.
Published: (2026)
by: Miron, Marius, et al.
Published: (2026)
Learning to reconstruct from saturated data: audio declipping and high-dynamic range imaging
by: Sechaud, Victor, et al.
Published: (2026)
by: Sechaud, Victor, et al.
Published: (2026)
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
by: Wang, Wenbin, et al.
Published: (2024)
by: Wang, Wenbin, et al.
Published: (2024)
Keep what you need : extracting efficient subnetworks from large audio representation models
by: Genova, David, et al.
Published: (2025)
by: Genova, David, et al.
Published: (2025)
SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation
by: Muna, Ummy Maria, et al.
Published: (2025)
by: Muna, Ummy Maria, et al.
Published: (2025)
ADIFF: Explaining audio difference using natural language
by: Deshmukh, Soham, et al.
Published: (2025)
by: Deshmukh, Soham, et al.
Published: (2025)
Mellow: a small audio language model for reasoning
by: Deshmukh, Soham, et al.
Published: (2025)
by: Deshmukh, Soham, et al.
Published: (2025)
End-to-end audio-visual learning for cochlear implant sound coding simulations in noisy environments
by: Lin, Meng-Ping, et al.
Published: (2025)
by: Lin, Meng-Ping, et al.
Published: (2025)
Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering
by: Zhao, Jinghua, et al.
Published: (2025)
by: Zhao, Jinghua, et al.
Published: (2025)
MBCodec:Thorough disentangle for high-fidelity audio compression
by: Zhang, Ruonan, et al.
Published: (2025)
by: Zhang, Ruonan, et al.
Published: (2025)
SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture
by: Sui, Kehan, et al.
Published: (2025)
by: Sui, Kehan, et al.
Published: (2025)
Hardware-accelerated graph neural networks: an alternative approach for neuromorphic event-based audio classification and keyword spotting on SoC FPGA
by: Jeziorek, Kamil, et al.
Published: (2026)
by: Jeziorek, Kamil, et al.
Published: (2026)
RAS: a Reliability Oriented Metric for Automatic Speech Recognition
by: Huang, Wenbin, et al.
Published: (2026)
by: Huang, Wenbin, et al.
Published: (2026)
Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning
by: Yang, Dongchao, et al.
Published: (2025)
by: Yang, Dongchao, et al.
Published: (2025)
AudioMAE++: learning better masked audio representations with SwiGLU FFNs
by: Yadav, Sarthak, et al.
Published: (2025)
by: Yadav, Sarthak, et al.
Published: (2025)
GRAM: Spatial general-purpose audio representation models for real-world applications
by: Yuksel, Goksenin, et al.
Published: (2025)
by: Yuksel, Goksenin, et al.
Published: (2025)
Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
by: Nasr, Seham, et al.
Published: (2025)
by: Nasr, Seham, et al.
Published: (2025)
AISTAT lab system for DCASE2025 Task6: Language-based audio retrieval
by: Kim, Hyun Jun, et al.
Published: (2025)
by: Kim, Hyun Jun, et al.
Published: (2025)
Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes
by: Roman, Adrian S., et al.
Published: (2024)
by: Roman, Adrian S., et al.
Published: (2024)
Recomposer: Event-roll-guided generative audio editing
by: Ellis, Daniel P. W., et al.
Published: (2025)
by: Ellis, Daniel P. W., et al.
Published: (2025)
AST: Adaptive, Seamless, and Training-Free Precise Speech Editing
by: Lv, Sihan, et al.
Published: (2026)
by: Lv, Sihan, et al.
Published: (2026)
Where are we in audio deepfake detection? A systematic analysis over generative and detection models
by: Li, Xiang, et al.
Published: (2024)
by: Li, Xiang, et al.
Published: (2024)
DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
by: Zhao, Lei, et al.
Published: (2025)
by: Zhao, Lei, et al.
Published: (2025)
Forensic deepfake audio detection using segmental speech features
by: Yang, Tianle, et al.
Published: (2025)
by: Yang, Tianle, et al.
Published: (2025)
NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
by: Robinson, David, et al.
Published: (2024)
by: Robinson, David, et al.
Published: (2024)
A sound description: Exploring prompt templates and class descriptions to enhance zero-shot audio classification
by: Olvera, Michel, et al.
Published: (2024)
by: Olvera, Michel, et al.
Published: (2024)
Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition
by: Chen, Jinming, et al.
Published: (2024)
by: Chen, Jinming, et al.
Published: (2024)
Keyword spotting using convolutional neural network for speech recognition in Hindi
by: Bharti, Saru, et al.
Published: (2026)
by: Bharti, Saru, et al.
Published: (2026)
Exploring bat song syllable representations in self-supervised audio encoders
by: Kloots, Marianne de Heer, et al.
Published: (2024)
by: Kloots, Marianne de Heer, et al.
Published: (2024)
TimberAgent: Gram-Guided Retrieval for Executable Music Effect Control
by: He, Shihao, et al.
Published: (2026)
by: He, Shihao, et al.
Published: (2026)
Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention
by: Wang, Cong, et al.
Published: (2025)
by: Wang, Cong, et al.
Published: (2025)
A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model
by: Zhao, Dongdi, et al.
Published: (2024)
by: Zhao, Dongdi, et al.
Published: (2024)
Joint sentiment analysis of lyrics and audio in music
by: Schaab, Lea, et al.
Published: (2024)
by: Schaab, Lea, et al.
Published: (2024)
Adaptive Accompaniment with ReaLchords
by: Wu, Yusong, et al.
Published: (2025)
by: Wu, Yusong, et al.
Published: (2025)
MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2026)
by: Yang, Chih-Kai, et al.
Published: (2026)
Similar Items
-
SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization
by: Wang, Jin, et al.
Published: (2025) -
TQCodec: Towards neural audio codec for high-fidelity music streaming
by: He, Lixing, et al.
Published: (2026) -
Stage-adaptive audio diffusion modeling
by: Zhang, Xuanhao, et al.
Published: (2026) -
Making deep neural networks work for medical audio: representation, compression and domain adaptation
by: Onu, Charles C
Published: (2025) -
An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
by: Yadav, Sarthak, et al.
Published: (2025)