Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Borovik, Ilya, Gavrilev, Dmitrii, Viro, Vladimir
Format:	Preprint
Published:	2025
Subjects:	Sound Machine Learning Multimedia
Online Access:	https://arxiv.org/abs/2511.03425
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917062439337984
author	Borovik, Ilya Gavrilev, Dmitrii Viro, Vladimir
author_facet	Borovik, Ilya Gavrilev, Dmitrii Viro, Vladimir
contents	Emotions are fundamental to the creation and perception of music performances. However, achieving human-like expression and emotion through machine learning models for performance rendering remains a challenging task. In this work, we present SyMuPe, a novel framework for developing and training affective and controllable symbolic piano performance models. Our flagship model, PianoFlow, uses conditional flow matching trained to solve diverse multi-mask performance inpainting tasks. By design, it supports both unconditional generation and infilling of music performance features. For training, we use a curated, cleaned dataset of 2,968 hours of aligned musical scores and expressive MIDI performances. For text and emotion control, we integrate a piano performance emotion classifier and tune PianoFlow with the emotion-weighted Flan-T5 text embeddings provided as conditional inputs. Objective and subjective evaluations against transformer-based baselines and existing models show that PianoFlow not only outperforms other approaches, but also achieves performance quality comparable to that of human-recorded and transcribed MIDI samples. For emotion control, we present and analyze samples generated under different text conditioning scenarios. The developed model can be integrated into interactive applications, contributing to the creation of more accessible and engaging music performance systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_03425
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	SyMuPe: Affective and Controllable Symbolic Music Performance Borovik, Ilya Gavrilev, Dmitrii Viro, Vladimir Sound Machine Learning Multimedia Emotions are fundamental to the creation and perception of music performances. However, achieving human-like expression and emotion through machine learning models for performance rendering remains a challenging task. In this work, we present SyMuPe, a novel framework for developing and training affective and controllable symbolic piano performance models. Our flagship model, PianoFlow, uses conditional flow matching trained to solve diverse multi-mask performance inpainting tasks. By design, it supports both unconditional generation and infilling of music performance features. For training, we use a curated, cleaned dataset of 2,968 hours of aligned musical scores and expressive MIDI performances. For text and emotion control, we integrate a piano performance emotion classifier and tune PianoFlow with the emotion-weighted Flan-T5 text embeddings provided as conditional inputs. Objective and subjective evaluations against transformer-based baselines and existing models show that PianoFlow not only outperforms other approaches, but also achieves performance quality comparable to that of human-recorded and transcribed MIDI samples. For emotion control, we present and analyze samples generated under different text conditioning scenarios. The developed model can be integrated into interactive applications, contributing to the creation of more accessible and engaging music performance systems.
title	SyMuPe: Affective and Controllable Symbolic Music Performance
topic	Sound Machine Learning Multimedia
url	https://arxiv.org/abs/2511.03425

Similar Items