Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	He, Xian, Zeng, Wei, Wang, Ye
Format:	Preprint
Published:	2025
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2510.10175
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918158742323200
author	He, Xian Zeng, Wei Wang, Ye
author_facet	He, Xian Zeng, Wei Wang, Ye
contents	Highly-informed Expressive Performance Rendering (EPR) systems transform music scores with rich musical annotations into human-like expressive performance MIDI files. While these systems have achieved promising results, the availability of detailed music scores is limited compared to MIDI files and are less flexible to work with using a digital audio workstation (DAW). Recent advancements in low-informed EPR systems offer a more accessible alternative by directly utilizing score-derived MIDI as input, but these systems often exhibit suboptimal performance. Meanwhile, existing works are evaluated with diverse automatic metrics and data formats, hindering direct objective comparisons between EPR systems. In this study, we introduce Peransformer, a transformer-based low-informed EPR system designed to bridge the gap between low-informed and highly-informed EPR systems. Our approach incorporates a score-aware discriminator that leverages the underlying score-derived MIDI files and is trained on a score-to-performance paired, note-to-note aligned MIDI dataset. Experimental results demonstrate that Peransformer achieves state-of-the-art performance among low-informed systems, as validated by subjective evaluations. Furthermore, we extend existing automatic evaluation metrics for EPR systems and introduce generalized EPR metrics (GEM), enabling more direct, accurate, and reliable comparisons across EPR systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_10175
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Peransformer: Improving Low-informed Expressive Performance Rendering with Score-aware Discriminator He, Xian Zeng, Wei Wang, Ye Sound Audio and Speech Processing Highly-informed Expressive Performance Rendering (EPR) systems transform music scores with rich musical annotations into human-like expressive performance MIDI files. While these systems have achieved promising results, the availability of detailed music scores is limited compared to MIDI files and are less flexible to work with using a digital audio workstation (DAW). Recent advancements in low-informed EPR systems offer a more accessible alternative by directly utilizing score-derived MIDI as input, but these systems often exhibit suboptimal performance. Meanwhile, existing works are evaluated with diverse automatic metrics and data formats, hindering direct objective comparisons between EPR systems. In this study, we introduce Peransformer, a transformer-based low-informed EPR system designed to bridge the gap between low-informed and highly-informed EPR systems. Our approach incorporates a score-aware discriminator that leverages the underlying score-derived MIDI files and is trained on a score-to-performance paired, note-to-note aligned MIDI dataset. Experimental results demonstrate that Peransformer achieves state-of-the-art performance among low-informed systems, as validated by subjective evaluations. Furthermore, we extend existing automatic evaluation metrics for EPR systems and introduce generalized EPR metrics (GEM), enabling more direct, accurate, and reliable comparisons across EPR systems.
title	Peransformer: Improving Low-informed Expressive Performance Rendering with Score-aware Discriminator
topic	Sound Audio and Speech Processing
url	https://arxiv.org/abs/2510.10175

Similar Items