Saved in:
| Main Authors: | Yang, Meng, McCormack, Jon, Llano, Maria Teresa, Su, Wanchao, Lei, Chao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.21740 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exploring the Feasibility of LLMs for Automated Music Emotion Annotation
by: Yang, Meng, et al.
Published: (2025)
by: Yang, Meng, et al.
Published: (2025)
MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large Language Models
by: Liu, Shansong, et al.
Published: (2024)
by: Liu, Shansong, et al.
Published: (2024)
MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation
by: Wu, Shih-Lun, et al.
Published: (2025)
by: Wu, Shih-Lun, et al.
Published: (2025)
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
by: Yeo, Jeong Hun, et al.
Published: (2025)
by: Yeo, Jeong Hun, et al.
Published: (2025)
SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition
by: Cheng, Zebang, et al.
Published: (2024)
by: Cheng, Zebang, et al.
Published: (2024)
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
by: Radhakrishnan, Srijith, et al.
Published: (2023)
by: Radhakrishnan, Srijith, et al.
Published: (2023)
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
by: Cheng, Zebang, et al.
Published: (2024)
by: Cheng, Zebang, et al.
Published: (2024)
Improving BERT for Symbolic Music Understanding Using Token Denoising and Pianoroll Prediction
by: Wang, Jun-You, et al.
Published: (2025)
by: Wang, Jun-You, et al.
Published: (2025)
Amadeus: Autoregressive Model with Bidirectional Attribute Modelling for Symbolic Music
by: Su, Hongju, et al.
Published: (2025)
by: Su, Hongju, et al.
Published: (2025)
Adaptable Symbolic Music Infilling with MIDI-RWKV
by: Zhou-Zheng, Christian, et al.
Published: (2025)
by: Zhou-Zheng, Christian, et al.
Published: (2025)
MeloTrans: A Text to Symbolic Music Generation Model Following Human Composition Habit
by: Wang, Yutian, et al.
Published: (2024)
by: Wang, Yutian, et al.
Published: (2024)
MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition
by: Pasquier, Philippe, et al.
Published: (2025)
by: Pasquier, Philippe, et al.
Published: (2025)
Flexible Control in Symbolic Music Generation via Musical Metadata
by: Han, Sangjun, et al.
Published: (2024)
by: Han, Sangjun, et al.
Published: (2024)
Optimizing Feature Extraction for Symbolic Music
by: Simonetta, Federico, et al.
Published: (2023)
by: Simonetta, Federico, et al.
Published: (2023)
SyMuPe: Affective and Controllable Symbolic Music Performance
by: Borovik, Ilya, et al.
Published: (2025)
by: Borovik, Ilya, et al.
Published: (2025)
DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning
by: Mao, Zhuoyuan, et al.
Published: (2025)
by: Mao, Zhuoyuan, et al.
Published: (2025)
Dance2MIDI: Dance-driven multi-instruments music generation
by: Han, Bo, et al.
Published: (2023)
by: Han, Bo, et al.
Published: (2023)
Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations
by: Wachter, Maximilian, et al.
Published: (2026)
by: Wachter, Maximilian, et al.
Published: (2026)
MusicAOG: an Energy-Based Model for Learning and Sampling a Hierarchical Representation of Symbolic Music
by: Qian, Yikai, et al.
Published: (2024)
by: Qian, Yikai, et al.
Published: (2024)
MuseAgent-1: Interactive Grounded Multimodal Understanding of Music Scores and Performance Audio
by: Zhao, Qihao, et al.
Published: (2026)
by: Zhao, Qihao, et al.
Published: (2026)
Beat-Based Rhythm Quantization of MIDI Performances
by: Wachter, Maximilian, et al.
Published: (2025)
by: Wachter, Maximilian, et al.
Published: (2025)
Music4All A+A: A Multimodal Dataset for Music Information Retrieval Tasks
by: Geiger, Jonas, et al.
Published: (2025)
by: Geiger, Jonas, et al.
Published: (2025)
Let the Model Learn to Feel: Mode-Guided Tonality Injection for Symbolic Music Emotion Recognition
by: Xia, Haiying, et al.
Published: (2025)
by: Xia, Haiying, et al.
Published: (2025)
Video Echoed in Music: Semantic, Temporal, and Rhythmic Alignment for Video-to-Music Generation
by: Tong, Xinyi, et al.
Published: (2025)
by: Tong, Xinyi, et al.
Published: (2025)
Frechet Music Distance: A Metric For Generative Symbolic Music Evaluation
by: Retkowski, Jan, et al.
Published: (2024)
by: Retkowski, Jan, et al.
Published: (2024)
MusicWeaver: Composer-Style Structural Editing and Minute-Scale Coherent Music Generation
by: Wang, Xuanchen, et al.
Published: (2025)
by: Wang, Xuanchen, et al.
Published: (2025)
MusER: Musical Element-Based Regularization for Generating Symbolic Music with Emotion
by: Ji, Shulei, et al.
Published: (2023)
by: Ji, Shulei, et al.
Published: (2023)
Gesture2Music: A Low-Latency Real-Time Framework for Continuous Gesture-Driven Music Generation
by: Jeyaraj, Rathinaraja, et al.
Published: (2025)
by: Jeyaraj, Rathinaraja, et al.
Published: (2025)
Fretting-Transformer: Encoder-Decoder Model for MIDI to Tablature Transcription
by: Hamberger, Anna, et al.
Published: (2025)
by: Hamberger, Anna, et al.
Published: (2025)
Mixer Metaphors: audio interfaces for non-musical applications
by: McNamara, Tace, et al.
Published: (2025)
by: McNamara, Tace, et al.
Published: (2025)
A Survey on Multimodal Music Emotion Recognition
by: Liyanarachchi, Rashini, et al.
Published: (2025)
by: Liyanarachchi, Rashini, et al.
Published: (2025)
On the de-duplication of the Lakh MIDI dataset
by: Choi, Eunjin, et al.
Published: (2025)
by: Choi, Eunjin, et al.
Published: (2025)
SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing
by: Ma, Ziyang, et al.
Published: (2026)
by: Ma, Ziyang, et al.
Published: (2026)
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models
by: Weck, Benno, et al.
Published: (2024)
by: Weck, Benno, et al.
Published: (2024)
PianoBind: A Multimodal Joint Embedding Model for Pop-piano Music
by: Bang, Hayeon, et al.
Published: (2025)
by: Bang, Hayeon, et al.
Published: (2025)
MidiCaps: A large-scale MIDI dataset with text captions
by: Melechovsky, Jan, et al.
Published: (2024)
by: Melechovsky, Jan, et al.
Published: (2024)
PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing
by: Long, Phillip, et al.
Published: (2024)
by: Long, Phillip, et al.
Published: (2024)
HAFM: Hierarchical Autoregressive Foundation Model for Music Accompaniment Generation
by: Zhu, Jian, et al.
Published: (2026)
by: Zhu, Jian, et al.
Published: (2026)
A Survey of Foundation Models for Music Understanding
by: Li, Wenjun, et al.
Published: (2024)
by: Li, Wenjun, et al.
Published: (2024)
Beat and Downbeat Tracking in Performance MIDI Using an End-to-End Transformer Architecture
by: Murgul, Sebastian, et al.
Published: (2025)
by: Murgul, Sebastian, et al.
Published: (2025)
Similar Items
-
Exploring the Feasibility of LLMs for Automated Music Emotion Annotation
by: Yang, Meng, et al.
Published: (2025) -
MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large Language Models
by: Liu, Shansong, et al.
Published: (2024) -
MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation
by: Wu, Shih-Lun, et al.
Published: (2025) -
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
by: Yeo, Jeong Hun, et al.
Published: (2025) -
SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition
by: Cheng, Zebang, et al.
Published: (2024)