Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Jonason, Nicolas, Casini, Luca, Sturm, Bob L. T.
Format:	Preprint
Published:	2025
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2504.16839
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908334512144384
author	Jonason, Nicolas Casini, Luca Sturm, Bob L. T.
author_facet	Jonason, Nicolas Casini, Luca Sturm, Bob L. T.
contents	Recent work has proposed training machine learning models to predict aesthetic ratings for music audio. Our work explores whether such models can be used to finetune a symbolic music generation system with reinforcement learning, and what effect this has on the system outputs. To test this, we use group relative policy optimization to finetune a piano MIDI model with Meta Audiobox Aesthetics ratings of audio-rendered outputs as the reward. We find that this optimization has effects on multiple low-level features of the generated outputs, and improves the average subjective ratings in a preliminary listening study with $14$ participants. We also find that over-optimization dramatically reduces diversity of model outputs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_16839
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward Jonason, Nicolas Casini, Luca Sturm, Bob L. T. Sound Recent work has proposed training machine learning models to predict aesthetic ratings for music audio. Our work explores whether such models can be used to finetune a symbolic music generation system with reinforcement learning, and what effect this has on the system outputs. To test this, we use group relative policy optimization to finetune a piano MIDI model with Meta Audiobox Aesthetics ratings of audio-rendered outputs as the reward. We find that this optimization has effects on multiple low-level features of the generated outputs, and improves the average subjective ratings in a preliminary listening study with $14$ participants. We also find that over-optimization dramatically reduces diversity of model outputs.
title	SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward
topic	Sound
url	https://arxiv.org/abs/2504.16839

Similar Items