Saved in:
| Main Authors: | Wang, Pu, Li, Junhui, Li, Jialu, Guo, Liangdong, Zhang, Youshan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.09154 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Complex Image-Generative Diffusion Transformer for Audio Denoising
by: Li, Junhui, et al.
Published: (2024)
by: Li, Junhui, et al.
Published: (2024)
Vision Transformer Segmentation for Visual Bird Sound Denoising
by: Kumar, Sahil, et al.
Published: (2024)
by: Kumar, Sahil, et al.
Published: (2024)
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
by: Liu, Jizhong, et al.
Published: (2024)
by: Liu, Jizhong, et al.
Published: (2024)
Next Tokens Denoising for Speech Synthesis
by: Liu, Yanqing, et al.
Published: (2025)
by: Liu, Yanqing, et al.
Published: (2025)
MiMo-Audio: Audio Language Models are Few-Shot Learners
by: Core Team, et al.
Published: (2025)
by: Core Team, et al.
Published: (2025)
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders
by: Zhang, Wenyu, et al.
Published: (2024)
by: Zhang, Wenyu, et al.
Published: (2024)
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
by: Li, Bohan, et al.
Published: (2025)
by: Li, Bohan, et al.
Published: (2025)
Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models
by: Sridhar, Arvind Krishna, et al.
Published: (2024)
by: Sridhar, Arvind Krishna, et al.
Published: (2024)
DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation
by: Benita, Roi, et al.
Published: (2023)
by: Benita, Roi, et al.
Published: (2023)
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
by: Huang, Ailin, et al.
Published: (2025)
by: Huang, Ailin, et al.
Published: (2025)
Step-Audio 2 Technical Report
by: Wu, Boyong, et al.
Published: (2025)
by: Wu, Boyong, et al.
Published: (2025)
Covo-Audio Technical Report
by: Wang, Wenfu, et al.
Published: (2026)
by: Wang, Wenfu, et al.
Published: (2026)
An Audio-textual Diffusion Model For Converting Speech Signals Into Ultrasound Tongue Imaging Data
by: Yang, Yudong, et al.
Published: (2024)
by: Yang, Yudong, et al.
Published: (2024)
AudioBench: A Universal Benchmark for Audio Large Language Models
by: Wang, Bin, et al.
Published: (2024)
by: Wang, Bin, et al.
Published: (2024)
Bridging Language Gaps in Audio-Text Retrieval
by: Yan, Zhiyong, et al.
Published: (2024)
by: Yan, Zhiyong, et al.
Published: (2024)
Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model
by: Huang, Hukai, et al.
Published: (2024)
by: Huang, Hukai, et al.
Published: (2024)
Audio-Thinker: Guiding Audio Language Model When and How to Think via Reinforcement Learning
by: Wu, Shu, et al.
Published: (2025)
by: Wu, Shu, et al.
Published: (2025)
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
by: Li, Tianpeng, et al.
Published: (2025)
by: Li, Tianpeng, et al.
Published: (2025)
Improving Whisper's Recognition Performance for Under-Represented Language Kazakh Leveraging Unpaired Speech and Text
by: Li, Jinpeng, et al.
Published: (2024)
by: Li, Jinpeng, et al.
Published: (2024)
Cross-lingual Alzheimer's Disease detection based on paralinguistic and pre-trained features
by: Chen, Xuchu, et al.
Published: (2023)
by: Chen, Xuchu, et al.
Published: (2023)
Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval
by: Zhou, Lifeng, et al.
Published: (2024)
by: Zhou, Lifeng, et al.
Published: (2024)
The NeurIPS 2023 Machine Learning for Audio Workshop: Affective Audio Benchmarks and Novel Data
by: Baird, Alice, et al.
Published: (2024)
by: Baird, Alice, et al.
Published: (2024)
YODAS: Youtube-Oriented Dataset for Audio and Speech
by: Li, Xinjian, et al.
Published: (2024)
by: Li, Xinjian, et al.
Published: (2024)
Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
by: Wang, Jun, et al.
Published: (2025)
by: Wang, Jun, et al.
Published: (2025)
Direct Simultaneous Translation Activation for Large Audio-Language Models
by: Zhang, Pei, et al.
Published: (2025)
by: Zhang, Pei, et al.
Published: (2025)
DENOASR: Debiasing ASRs through Selective Denoising
by: Rai, Anand Kumar, et al.
Published: (2024)
by: Rai, Anand Kumar, et al.
Published: (2024)
ALLM4ADD: Unlocking the Capabilities of Audio Large Language Models for Audio Deepfake Detection
by: Gu, Hao, et al.
Published: (2025)
by: Gu, Hao, et al.
Published: (2025)
Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective
by: Chen, Chen, et al.
Published: (2024)
by: Chen, Chen, et al.
Published: (2024)
Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio
by: Zhang, Lin, et al.
Published: (2024)
by: Zhang, Lin, et al.
Published: (2024)
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
by: Chen, Guoguo, et al.
Published: (2021)
by: Chen, Guoguo, et al.
Published: (2021)
MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research
by: Li, Song, et al.
Published: (2024)
by: Li, Song, et al.
Published: (2024)
Mind the Gap! Static and Interactive Evaluations of Large Audio Models
by: Li, Minzhi, et al.
Published: (2025)
by: Li, Minzhi, et al.
Published: (2025)
SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning
by: Wang, Peidong, et al.
Published: (2026)
by: Wang, Peidong, et al.
Published: (2026)
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
by: Chen, Chen, et al.
Published: (2025)
by: Chen, Chen, et al.
Published: (2025)
BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR
by: Ma, Guodong, et al.
Published: (2025)
by: Ma, Guodong, et al.
Published: (2025)
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
by: Wei, Kun, et al.
Published: (2023)
by: Wei, Kun, et al.
Published: (2023)
BATON: Aligning Text-to-Audio Model with Human Preference Feedback
by: Liao, Huan, et al.
Published: (2024)
by: Liao, Huan, et al.
Published: (2024)
What Are They Doing? Joint Audio-Speech Co-Reasoning
by: Wang, Yingzhi, et al.
Published: (2024)
by: Wang, Yingzhi, et al.
Published: (2024)
SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection
by: Yi, Jiangyan, et al.
Published: (2022)
by: Yi, Jiangyan, et al.
Published: (2022)
Audios Don't Lie: Multi-Frequency Channel Attention Mechanism for Audio Deepfake Detection
by: Feng, Yangguang
Published: (2024)
by: Feng, Yangguang
Published: (2024)
Similar Items
-
Complex Image-Generative Diffusion Transformer for Audio Denoising
by: Li, Junhui, et al.
Published: (2024) -
Vision Transformer Segmentation for Visual Bird Sound Denoising
by: Kumar, Sahil, et al.
Published: (2024) -
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
by: Liu, Jizhong, et al.
Published: (2024) -
Next Tokens Denoising for Speech Synthesis
by: Liu, Yanqing, et al.
Published: (2025) -
MiMo-Audio: Audio Language Models are Few-Shot Learners
by: Core Team, et al.
Published: (2025)