:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Meng, McCormack, Jon, Llano, Maria Teresa, Su, Wanchao, Lei, Chao
Format:	Preprint
Published:	2026
Subjects:	Multimedia Sound
Online Access:	https://arxiv.org/abs/2601.21740
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Exploring the Feasibility of LLMs for Automated Music Emotion Annotation
by: Yang, Meng, et al.
Published: (2025)

MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large Language Models
by: Liu, Shansong, et al.
Published: (2024)

MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation
by: Wu, Shih-Lun, et al.
Published: (2025)

MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
by: Yeo, Jeong Hun, et al.
Published: (2025)

SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition
by: Cheng, Zebang, et al.
Published: (2024)

Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
by: Radhakrishnan, Srijith, et al.
Published: (2023)

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
by: Cheng, Zebang, et al.
Published: (2024)

Improving BERT for Symbolic Music Understanding Using Token Denoising and Pianoroll Prediction
by: Wang, Jun-You, et al.
Published: (2025)

Amadeus: Autoregressive Model with Bidirectional Attribute Modelling for Symbolic Music
by: Su, Hongju, et al.
Published: (2025)

Adaptable Symbolic Music Infilling with MIDI-RWKV
by: Zhou-Zheng, Christian, et al.
Published: (2025)

MeloTrans: A Text to Symbolic Music Generation Model Following Human Composition Habit
by: Wang, Yutian, et al.
Published: (2024)

MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition
by: Pasquier, Philippe, et al.
Published: (2025)

Flexible Control in Symbolic Music Generation via Musical Metadata
by: Han, Sangjun, et al.
Published: (2024)

Optimizing Feature Extraction for Symbolic Music
by: Simonetta, Federico, et al.
Published: (2023)

SyMuPe: Affective and Controllable Symbolic Music Performance
by: Borovik, Ilya, et al.
Published: (2025)

DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning
by: Mao, Zhuoyuan, et al.
Published: (2025)

Dance2MIDI: Dance-driven multi-instruments music generation
by: Han, Bo, et al.
Published: (2023)

Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations
by: Wachter, Maximilian, et al.
Published: (2026)

MusicAOG: an Energy-Based Model for Learning and Sampling a Hierarchical Representation of Symbolic Music
by: Qian, Yikai, et al.
Published: (2024)

MuseAgent-1: Interactive Grounded Multimodal Understanding of Music Scores and Performance Audio
by: Zhao, Qihao, et al.
Published: (2026)

Beat-Based Rhythm Quantization of MIDI Performances
by: Wachter, Maximilian, et al.
Published: (2025)

Music4All A+A: A Multimodal Dataset for Music Information Retrieval Tasks
by: Geiger, Jonas, et al.
Published: (2025)

Let the Model Learn to Feel: Mode-Guided Tonality Injection for Symbolic Music Emotion Recognition
by: Xia, Haiying, et al.
Published: (2025)

Video Echoed in Music: Semantic, Temporal, and Rhythmic Alignment for Video-to-Music Generation
by: Tong, Xinyi, et al.
Published: (2025)

Frechet Music Distance: A Metric For Generative Symbolic Music Evaluation
by: Retkowski, Jan, et al.
Published: (2024)

MusicWeaver: Composer-Style Structural Editing and Minute-Scale Coherent Music Generation
by: Wang, Xuanchen, et al.
Published: (2025)

MusER: Musical Element-Based Regularization for Generating Symbolic Music with Emotion
by: Ji, Shulei, et al.
Published: (2023)

Gesture2Music: A Low-Latency Real-Time Framework for Continuous Gesture-Driven Music Generation
by: Jeyaraj, Rathinaraja, et al.
Published: (2025)

Fretting-Transformer: Encoder-Decoder Model for MIDI to Tablature Transcription
by: Hamberger, Anna, et al.
Published: (2025)

Mixer Metaphors: audio interfaces for non-musical applications
by: McNamara, Tace, et al.
Published: (2025)

A Survey on Multimodal Music Emotion Recognition
by: Liyanarachchi, Rashini, et al.
Published: (2025)

On the de-duplication of the Lakh MIDI dataset
by: Choi, Eunjin, et al.
Published: (2025)

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing
by: Ma, Ziyang, et al.
Published: (2026)

MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models
by: Weck, Benno, et al.
Published: (2024)

PianoBind: A Multimodal Joint Embedding Model for Pop-piano Music
by: Bang, Hayeon, et al.
Published: (2025)

MidiCaps: A large-scale MIDI dataset with text captions
by: Melechovsky, Jan, et al.
Published: (2024)

PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing
by: Long, Phillip, et al.
Published: (2024)

HAFM: Hierarchical Autoregressive Foundation Model for Music Accompaniment Generation
by: Zhu, Jian, et al.
Published: (2026)

A Survey of Foundation Models for Music Understanding
by: Li, Wenjun, et al.
Published: (2024)

Beat and Downbeat Tracking in Performance MIDI Using an End-to-End Transformer Architecture
by: Murgul, Sebastian, et al.
Published: (2025)