Saved in:
| Main Authors: | Yang, Qian, Xu, Jin, Liu, Wenrui, Chu, Yunfei, Jiang, Ziyue, Zhou, Xiaohuan, Leng, Yichong, Lv, Yuanjun, Zhao, Zhou, Zhou, Chang, Zhou, Jingren |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.07729 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
by: Liu, Wenrui, et al.
Published: (2024)
by: Liu, Wenrui, et al.
Published: (2024)
Qwen2-Audio Technical Report
by: Chu, Yunfei, et al.
Published: (2024)
by: Chu, Yunfei, et al.
Published: (2024)
TTA-Bench: A Comprehensive Benchmark for Evaluating Text-to-Audio Models
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
by: Song, Zirui, et al.
Published: (2025)
by: Song, Zirui, et al.
Published: (2025)
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
by: Du, Zhihao, et al.
Published: (2023)
by: Du, Zhihao, et al.
Published: (2023)
AudioBench: A Universal Benchmark for Audio Large Language Models
by: Wang, Bin, et al.
Published: (2024)
by: Wang, Bin, et al.
Published: (2024)
Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
by: Zuo, Jialong, et al.
Published: (2025)
by: Zuo, Jialong, et al.
Published: (2025)
The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models
by: Dinkel, Heinrich, et al.
Published: (2026)
by: Dinkel, Heinrich, et al.
Published: (2026)
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
by: Li, Bohan, et al.
Published: (2025)
by: Li, Bohan, et al.
Published: (2025)
CodecBench: A Comprehensive Benchmark for Acoustic and Semantic Evaluation
by: Deng, Ruifan, et al.
Published: (2025)
by: Deng, Ruifan, et al.
Published: (2025)
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
by: Ji, Shengpeng, et al.
Published: (2024)
by: Ji, Shengpeng, et al.
Published: (2024)
DiTReducio: A Training-Free Acceleration for DiT-Based TTS via Progressive Calibration
by: Huo, Yanru, et al.
Published: (2025)
by: Huo, Yanru, et al.
Published: (2025)
Language-Codec: Bridging Discrete Codec Representations and Speech Language Models
by: Ji, Shengpeng, et al.
Published: (2024)
by: Ji, Shengpeng, et al.
Published: (2024)
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy
by: Ma, Linhan, et al.
Published: (2024)
by: Ma, Linhan, et al.
Published: (2024)
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
by: Yang, Dongchao, et al.
Published: (2023)
by: Yang, Dongchao, et al.
Published: (2023)
PolyBench: A Benchmark for Compositional Reasoning in Polyphonic Audio
by: Chen, Yuanjian, et al.
Published: (2026)
by: Chen, Yuanjian, et al.
Published: (2026)
UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models
by: Shi, Qundong, et al.
Published: (2026)
by: Shi, Qundong, et al.
Published: (2026)
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
by: Xu, Xuenan, et al.
Published: (2023)
by: Xu, Xuenan, et al.
Published: (2023)
MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech
by: Chen, Huakang, et al.
Published: (2026)
by: Chen, Huakang, et al.
Published: (2026)
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
by: Zhou, Yixuan, et al.
Published: (2024)
by: Zhou, Yixuan, et al.
Published: (2024)
Audio Spatially-Guided Fusion for Audio-Visual Navigation
by: Zhou, Xinyu, et al.
Published: (2026)
by: Zhou, Xinyu, et al.
Published: (2026)
ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild
by: Yi, Jiangyan, et al.
Published: (2024)
by: Yi, Jiangyan, et al.
Published: (2024)
MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
by: Yang, Qian, et al.
Published: (2024)
by: Yang, Qian, et al.
Published: (2024)
Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders
by: Shan, Weiqiao, et al.
Published: (2025)
by: Shan, Weiqiao, et al.
Published: (2025)
AudioMarkBench: Benchmarking Robustness of Audio Watermarking
by: Liu, Hongbin, et al.
Published: (2024)
by: Liu, Hongbin, et al.
Published: (2024)
T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
by: Wang, Zehan, et al.
Published: (2025)
by: Wang, Zehan, et al.
Published: (2025)
LEMAS: Large A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models
by: Zhao, Zhiyuan, et al.
Published: (2026)
by: Zhao, Zhiyuan, et al.
Published: (2026)
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
by: Liu, Huadai, et al.
Published: (2024)
by: Liu, Huadai, et al.
Published: (2024)
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
by: Liu, Huadai, et al.
Published: (2024)
by: Liu, Huadai, et al.
Published: (2024)
ASAudio: A Survey of Advanced Spatial Audio Research
by: Zhu, Zhiyuan, et al.
Published: (2025)
by: Zhu, Zhiyuan, et al.
Published: (2025)
Representation-Regularized Convolutional Audio Transformer for Audio Understanding
by: Han, Bing, et al.
Published: (2026)
by: Han, Bing, et al.
Published: (2026)
FoleyBench: A Benchmark For Video-to-Audio Models
by: Dixit, Satvik, et al.
Published: (2025)
by: Dixit, Satvik, et al.
Published: (2025)
MSU-Bench: Towards Understanding the Conversational Multi-talker Scenarios
by: Wang, Shuai, et al.
Published: (2025)
by: Wang, Shuai, et al.
Published: (2025)
AudioEval: Automatic Dual-Perspective and Multi-Dimensional Evaluation of Text-to-Audio-Generation
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence
by: Kumar, Sonal, et al.
Published: (2025)
by: Kumar, Sonal, et al.
Published: (2025)
Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
by: Li, Jiaqi, et al.
Published: (2024)
by: Li, Jiaqi, et al.
Published: (2024)
Online Audio-Visual Autoregressive Speaker Extraction
by: Pan, Zexu, et al.
Published: (2025)
by: Pan, Zexu, et al.
Published: (2025)
MiDashengLM: Efficient Audio Understanding with General Audio Captions
by: Dinkel, Heinrich, et al.
Published: (2025)
by: Dinkel, Heinrich, et al.
Published: (2025)
Trusted Fake Audio Detection Based on Dirichlet Distribution
by: Ding, Chi, et al.
Published: (2025)
by: Ding, Chi, et al.
Published: (2025)
A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition
by: Zhou, Zhenyu, et al.
Published: (2024)
by: Zhou, Zhenyu, et al.
Published: (2024)
Similar Items
-
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
by: Liu, Wenrui, et al.
Published: (2024) -
Qwen2-Audio Technical Report
by: Chu, Yunfei, et al.
Published: (2024) -
TTA-Bench: A Comprehensive Benchmark for Evaluating Text-to-Audio Models
by: Wang, Hui, et al.
Published: (2025) -
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
by: Song, Zirui, et al.
Published: (2025) -
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
by: Du, Zhihao, et al.
Published: (2023)