Saved in:
| Main Authors: | Deng, Qixin, Yang, Qikai, Yuan, Ruibin, Huang, Yipeng, Wang, Yi, Liu, Xubo, Tian, Zeyue, Pan, Jiahao, Zhang, Ge, Lin, Hanfeng, Li, Yizhi, Ma, Yinghao, Fu, Jie, Lin, Chenghua, Benetos, Emmanouil, Wang, Wenwu, Xia, Guangyu, Xue, Wei, Guo, Yike |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.18081 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation
by: Zhou, Ziya, et al.
Published: (2024)
by: Zhou, Ziya, et al.
Published: (2024)
AutoMV: An Automatic Multi-Agent System for Music Video Generation
by: Tang, Xiaoxuan, et al.
Published: (2025)
by: Tang, Xiaoxuan, et al.
Published: (2025)
ChatMusician: Understanding and Generating Music Intrinsically with LLM
by: Yuan, Ruibin, et al.
Published: (2024)
by: Yuan, Ruibin, et al.
Published: (2024)
CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction
by: Ma, Yinghao, et al.
Published: (2026)
by: Ma, Yinghao, et al.
Published: (2026)
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
by: Deng, Zihao, et al.
Published: (2023)
by: Deng, Zihao, et al.
Published: (2023)
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
by: Tian, Zeyue, et al.
Published: (2024)
by: Tian, Zeyue, et al.
Published: (2024)
YuE: Scaling Open Foundation Models for Long-Form Music Generation
by: Yuan, Ruibin, et al.
Published: (2025)
by: Yuan, Ruibin, et al.
Published: (2025)
Audio-FLAN: A Preliminary Release
by: Xue, Liumeng, et al.
Published: (2025)
by: Xue, Liumeng, et al.
Published: (2025)
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models
by: Weck, Benno, et al.
Published: (2024)
by: Weck, Benno, et al.
Published: (2024)
MusicWeaver: Composer-Style Structural Editing and Minute-Scale Coherent Music Generation
by: Wang, Xuanchen, et al.
Published: (2025)
by: Wang, Xuanchen, et al.
Published: (2025)
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
by: Chi, Xiaowei, et al.
Published: (2024)
by: Chi, Xiaowei, et al.
Published: (2024)
MusicAOG: an Energy-Based Model for Learning and Sampling a Hierarchical Representation of Symbolic Music
by: Qian, Yikai, et al.
Published: (2024)
by: Qian, Yikai, et al.
Published: (2024)
Towards Generating Diverse Audio Captions via Adversarial Training
by: Mei, Xinhao, et al.
Published: (2022)
by: Mei, Xinhao, et al.
Published: (2022)
AudioX: A Unified Framework for Anything-to-Audio Generation
by: Tian, Zeyue, et al.
Published: (2025)
by: Tian, Zeyue, et al.
Published: (2025)
LLMs Meet Multimodal Generation and Editing: A Survey
by: He, Yingqing, et al.
Published: (2024)
by: He, Yingqing, et al.
Published: (2024)
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
by: Zhuo, Le, et al.
Published: (2023)
by: Zhuo, Le, et al.
Published: (2023)
Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing
by: Tian, Zeyue, et al.
Published: (2026)
by: Tian, Zeyue, et al.
Published: (2026)
MusFlow: Multimodal Music Generation via Conditional Flow Matching
by: Song, Jiahao, et al.
Published: (2025)
by: Song, Jiahao, et al.
Published: (2025)
Learning Temporal Resolution in Spectrogram for Audio Classification
by: Liu, Haohe, et al.
Published: (2022)
by: Liu, Haohe, et al.
Published: (2022)
Retrieval-Augmented Text-to-Audio Generation
by: Yuan, Yi, et al.
Published: (2023)
by: Yuan, Yi, et al.
Published: (2023)
Leveraging Pre-trained AudioLDM for Sound Generation: A Benchmark Study
by: Yuan, Yi, et al.
Published: (2023)
by: Yuan, Yi, et al.
Published: (2023)
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
by: Ma, Ziyang, et al.
Published: (2025)
by: Ma, Ziyang, et al.
Published: (2025)
Multimodal Fish Feeding Intensity Assessment in Aquaculture
by: Cui, Meng, et al.
Published: (2023)
by: Cui, Meng, et al.
Published: (2023)
Flexible Control in Symbolic Music Generation via Musical Metadata
by: Han, Sangjun, et al.
Published: (2024)
by: Han, Sangjun, et al.
Published: (2024)
Optimizing Feature Extraction for Symbolic Music
by: Simonetta, Federico, et al.
Published: (2023)
by: Simonetta, Federico, et al.
Published: (2023)
MIDI-LLaMA: An Instruction-Following Multimodal LLM for Symbolic Music Understanding
by: Yang, Meng, et al.
Published: (2026)
by: Yang, Meng, et al.
Published: (2026)
Video Echoed in Music: Semantic, Temporal, and Rhythmic Alignment for Video-to-Music Generation
by: Tong, Xinyi, et al.
Published: (2025)
by: Tong, Xinyi, et al.
Published: (2025)
Interpretable Zero-shot Referring Expression Comprehension with Query-driven Scene Graphs
by: Wu, Yike, et al.
Published: (2026)
by: Wu, Yike, et al.
Published: (2026)
MeloTrans: A Text to Symbolic Music Generation Model Following Human Composition Habit
by: Wang, Yutian, et al.
Published: (2024)
by: Wang, Yutian, et al.
Published: (2024)
SyMuPe: Affective and Controllable Symbolic Music Performance
by: Borovik, Ilya, et al.
Published: (2025)
by: Borovik, Ilya, et al.
Published: (2025)
Amadeus: Autoregressive Model with Bidirectional Attribute Modelling for Symbolic Music
by: Su, Hongju, et al.
Published: (2025)
by: Su, Hongju, et al.
Published: (2025)
CoComposer: LLM Multi-agent Collaborative Music Composition
by: Xing, Peiwen, et al.
Published: (2025)
by: Xing, Peiwen, et al.
Published: (2025)
Physics-Aware Novel-View Acoustic Synthesis with Vision-Language Priors and 3D Acoustic Environment Modeling
by: Fan, Congyi, et al.
Published: (2026)
by: Fan, Congyi, et al.
Published: (2026)
Improving BERT for Symbolic Music Understanding Using Token Denoising and Pianoroll Prediction
by: Wang, Jun-You, et al.
Published: (2025)
by: Wang, Jun-You, et al.
Published: (2025)
VAInpaint: Zero-Shot Video-Audio inpainting framework with LLMs-driven Module
by: Wu, Kam Man, et al.
Published: (2025)
by: Wu, Kam Man, et al.
Published: (2025)
Frechet Music Distance: A Metric For Generative Symbolic Music Evaluation
by: Retkowski, Jan, et al.
Published: (2024)
by: Retkowski, Jan, et al.
Published: (2024)
MixFake: Benchmarking and Enhancing Audio Deepfake Detection in Diverse Real-world Mixed Audio
by: Li, Qingcao, et al.
Published: (2026)
by: Li, Qingcao, et al.
Published: (2026)
Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions
by: Yuan, Yi, et al.
Published: (2024)
by: Yuan, Yi, et al.
Published: (2024)
Music Grounding by Short Video
by: Xin, Zijie, et al.
Published: (2024)
by: Xin, Zijie, et al.
Published: (2024)
MusicScore: A Dataset for Music Score Modeling and Generation
by: Lin, Yuheng, et al.
Published: (2024)
by: Lin, Yuheng, et al.
Published: (2024)
Similar Items
-
Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation
by: Zhou, Ziya, et al.
Published: (2024) -
AutoMV: An Automatic Multi-Agent System for Music Video Generation
by: Tang, Xiaoxuan, et al.
Published: (2025) -
ChatMusician: Understanding and Generating Music Intrinsically with LLM
by: Yuan, Ruibin, et al.
Published: (2024) -
CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction
by: Ma, Yinghao, et al.
Published: (2026) -
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
by: Deng, Zihao, et al.
Published: (2023)