Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Dai, Congren, Yang, Yue, Li, Krinos, Zhou, Huichi, Liang, Shijie, Zhang, Bo, Liu, Enyang, Jin, Ge, An, Hongran, Zhang, Haosen, Jing, Peiyuan, Lee, Kinhei, Zhang, Z henxuan, Li, Xiaobing, Sun, Maosong
Format:	Preprint
Published:	2025
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2511.20697
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913057093976064
author	Dai, Congren Yang, Yue Li, Krinos Zhou, Huichi Liang, Shijie Zhang, Bo Liu, Enyang Jin, Ge An, Hongran Zhang, Haosen Jing, Peiyuan Lee, Kinhei Zhang, Z henxuan Li, Xiaobing Sun, Maosong
author_facet	Dai, Congren Yang, Yue Li, Krinos Zhou, Huichi Liang, Shijie Zhang, Bo Liu, Enyang Jin, Ge An, Hongran Zhang, Haosen Jing, Peiyuan Lee, Kinhei Zhang, Z henxuan Li, Xiaobing Sun, Maosong
contents	Understanding complete musical scores entails integrated reasoning over pitch, rhythm, harmony, and large-scale structure, yet the ability of Large Language Models and Vision--Language Models to interpret full musical notation remains insufficiently examined. We introduce Musical Score Understanding Benchmark (MSU-Bench), a human-curated benchmark for score-level musical understanding across textual (ABC notation) and visual (PDF) modalities. MSU-Bench contains 1,800 generative question-answer pairs from works by Bach, Beethoven, Chopin, Debussy, and others, organised into four levels of increasing difficulty, ranging from onset information to texture and form. Evaluations of more than fifteen state-of-the-art models, in both zero-shot and fine-tuned settings, reveal pronounced modality gaps, unstable level-wise performance, and challenges in maintaining multilevel correctness. Fine-tuning substantially improves results across modalities while preserving general knowledge, positioning MSU-Bench as a robust foundation for future research in multimodal reasoning. The benchmark and code are available at https://github.com/Congren-Dai/MSU-Bench.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_20697
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Musical Score Understanding Benchmark: Evaluating Large Language Models' Comprehension of Complete Musical Scores Dai, Congren Yang, Yue Li, Krinos Zhou, Huichi Liang, Shijie Zhang, Bo Liu, Enyang Jin, Ge An, Hongran Zhang, Haosen Jing, Peiyuan Lee, Kinhei Zhang, Z henxuan Li, Xiaobing Sun, Maosong Sound Artificial Intelligence Understanding complete musical scores entails integrated reasoning over pitch, rhythm, harmony, and large-scale structure, yet the ability of Large Language Models and Vision--Language Models to interpret full musical notation remains insufficiently examined. We introduce Musical Score Understanding Benchmark (MSU-Bench), a human-curated benchmark for score-level musical understanding across textual (ABC notation) and visual (PDF) modalities. MSU-Bench contains 1,800 generative question-answer pairs from works by Bach, Beethoven, Chopin, Debussy, and others, organised into four levels of increasing difficulty, ranging from onset information to texture and form. Evaluations of more than fifteen state-of-the-art models, in both zero-shot and fine-tuned settings, reveal pronounced modality gaps, unstable level-wise performance, and challenges in maintaining multilevel correctness. Fine-tuning substantially improves results across modalities while preserving general knowledge, positioning MSU-Bench as a robust foundation for future research in multimodal reasoning. The benchmark and code are available at https://github.com/Congren-Dai/MSU-Bench.
title	Musical Score Understanding Benchmark: Evaluating Large Language Models' Comprehension of Complete Musical Scores
topic	Sound Artificial Intelligence
url	https://arxiv.org/abs/2511.20697

Similar Items