Saved in:
Bibliographic Details
Main Authors: Jonason, Nicolas, Casini, Luca, Sturm, Bob L. T.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.16839
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908334512144384
author Jonason, Nicolas
Casini, Luca
Sturm, Bob L. T.
author_facet Jonason, Nicolas
Casini, Luca
Sturm, Bob L. T.
contents Recent work has proposed training machine learning models to predict aesthetic ratings for music audio. Our work explores whether such models can be used to finetune a symbolic music generation system with reinforcement learning, and what effect this has on the system outputs. To test this, we use group relative policy optimization to finetune a piano MIDI model with Meta Audiobox Aesthetics ratings of audio-rendered outputs as the reward. We find that this optimization has effects on multiple low-level features of the generated outputs, and improves the average subjective ratings in a preliminary listening study with $14$ participants. We also find that over-optimization dramatically reduces diversity of model outputs.
format Preprint
id arxiv_https___arxiv_org_abs_2504_16839
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward
Jonason, Nicolas
Casini, Luca
Sturm, Bob L. T.
Sound
Recent work has proposed training machine learning models to predict aesthetic ratings for music audio. Our work explores whether such models can be used to finetune a symbolic music generation system with reinforcement learning, and what effect this has on the system outputs. To test this, we use group relative policy optimization to finetune a piano MIDI model with Meta Audiobox Aesthetics ratings of audio-rendered outputs as the reward. We find that this optimization has effects on multiple low-level features of the generated outputs, and improves the average subjective ratings in a preliminary listening study with $14$ participants. We also find that over-optimization dramatically reduces diversity of model outputs.
title SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward
topic Sound
url https://arxiv.org/abs/2504.16839