Saved in:
Bibliographic Details
Main Authors: Borovik, Ilya, Gavrilev, Dmitrii, Viro, Vladimir
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.03425
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917062439337984
author Borovik, Ilya
Gavrilev, Dmitrii
Viro, Vladimir
author_facet Borovik, Ilya
Gavrilev, Dmitrii
Viro, Vladimir
contents Emotions are fundamental to the creation and perception of music performances. However, achieving human-like expression and emotion through machine learning models for performance rendering remains a challenging task. In this work, we present SyMuPe, a novel framework for developing and training affective and controllable symbolic piano performance models. Our flagship model, PianoFlow, uses conditional flow matching trained to solve diverse multi-mask performance inpainting tasks. By design, it supports both unconditional generation and infilling of music performance features. For training, we use a curated, cleaned dataset of 2,968 hours of aligned musical scores and expressive MIDI performances. For text and emotion control, we integrate a piano performance emotion classifier and tune PianoFlow with the emotion-weighted Flan-T5 text embeddings provided as conditional inputs. Objective and subjective evaluations against transformer-based baselines and existing models show that PianoFlow not only outperforms other approaches, but also achieves performance quality comparable to that of human-recorded and transcribed MIDI samples. For emotion control, we present and analyze samples generated under different text conditioning scenarios. The developed model can be integrated into interactive applications, contributing to the creation of more accessible and engaging music performance systems.
format Preprint
id arxiv_https___arxiv_org_abs_2511_03425
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle SyMuPe: Affective and Controllable Symbolic Music Performance
Borovik, Ilya
Gavrilev, Dmitrii
Viro, Vladimir
Sound
Machine Learning
Multimedia
Emotions are fundamental to the creation and perception of music performances. However, achieving human-like expression and emotion through machine learning models for performance rendering remains a challenging task. In this work, we present SyMuPe, a novel framework for developing and training affective and controllable symbolic piano performance models. Our flagship model, PianoFlow, uses conditional flow matching trained to solve diverse multi-mask performance inpainting tasks. By design, it supports both unconditional generation and infilling of music performance features. For training, we use a curated, cleaned dataset of 2,968 hours of aligned musical scores and expressive MIDI performances. For text and emotion control, we integrate a piano performance emotion classifier and tune PianoFlow with the emotion-weighted Flan-T5 text embeddings provided as conditional inputs. Objective and subjective evaluations against transformer-based baselines and existing models show that PianoFlow not only outperforms other approaches, but also achieves performance quality comparable to that of human-recorded and transcribed MIDI samples. For emotion control, we present and analyze samples generated under different text conditioning scenarios. The developed model can be integrated into interactive applications, contributing to the creation of more accessible and engaging music performance systems.
title SyMuPe: Affective and Controllable Symbolic Music Performance
topic Sound
Machine Learning
Multimedia
url https://arxiv.org/abs/2511.03425