Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.16839 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866908334512144384 |
|---|---|
| author | Jonason, Nicolas Casini, Luca Sturm, Bob L. T. |
| author_facet | Jonason, Nicolas Casini, Luca Sturm, Bob L. T. |
| contents | Recent work has proposed training machine learning models to predict aesthetic ratings for music audio. Our work explores whether such models can be used to finetune a symbolic music generation system with reinforcement learning, and what effect this has on the system outputs. To test this, we use group relative policy optimization to finetune a piano MIDI model with Meta Audiobox Aesthetics ratings of audio-rendered outputs as the reward. We find that this optimization has effects on multiple low-level features of the generated outputs, and improves the average subjective ratings in a preliminary listening study with $14$ participants. We also find that over-optimization dramatically reduces diversity of model outputs. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2504_16839 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward Jonason, Nicolas Casini, Luca Sturm, Bob L. T. Sound Recent work has proposed training machine learning models to predict aesthetic ratings for music audio. Our work explores whether such models can be used to finetune a symbolic music generation system with reinforcement learning, and what effect this has on the system outputs. To test this, we use group relative policy optimization to finetune a piano MIDI model with Meta Audiobox Aesthetics ratings of audio-rendered outputs as the reward. We find that this optimization has effects on multiple low-level features of the generated outputs, and improves the average subjective ratings in a preliminary listening study with $14$ participants. We also find that over-optimization dramatically reduces diversity of model outputs. |
| title | SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward |
| topic | Sound |
| url | https://arxiv.org/abs/2504.16839 |