Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Huan, Liang, Jinhua, Dixon, Simon
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2407.04518
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909262308966400
author	Zhang, Huan Liang, Jinhua Dixon, Simon
author_facet	Zhang, Huan Liang, Jinhua Dixon, Simon
contents	Our study investigates an approach for understanding musical performances through the lens of audio encoding models, focusing on the domain of solo Western classical piano music. Compared to composition-level attribute understanding such as key or genre, we identify a knowledge gap in performance-level music understanding, and address three critical tasks: expertise ranking, difficulty estimation, and piano technique detection, introducing a comprehensive Pianism-Labelling Dataset (PLD) for this purpose. We leverage pre-trained audio encoders, specifically Jukebox, Audio-MAE, MERT, and DAC, demonstrating varied capabilities in tackling downstream tasks, to explore whether domain-specific fine-tuning enhances capability in capturing performance nuances. Our best approach achieved 93.6\% accuracy in expertise ranking, 33.7\% in difficulty estimation, and 46.7\% in technique detection, with Audio-MAE as the overall most effective encoder. Finally, we conducted a case study on Chopin Piano Competition data using trained models for expertise ranking, which highlights the challenge of accurately assessing top-tier performances.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_04518
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	From Audio Encoders to Piano Judges: Benchmarking Performance Understanding for Solo Piano Zhang, Huan Liang, Jinhua Dixon, Simon Audio and Speech Processing Our study investigates an approach for understanding musical performances through the lens of audio encoding models, focusing on the domain of solo Western classical piano music. Compared to composition-level attribute understanding such as key or genre, we identify a knowledge gap in performance-level music understanding, and address three critical tasks: expertise ranking, difficulty estimation, and piano technique detection, introducing a comprehensive Pianism-Labelling Dataset (PLD) for this purpose. We leverage pre-trained audio encoders, specifically Jukebox, Audio-MAE, MERT, and DAC, demonstrating varied capabilities in tackling downstream tasks, to explore whether domain-specific fine-tuning enhances capability in capturing performance nuances. Our best approach achieved 93.6\% accuracy in expertise ranking, 33.7\% in difficulty estimation, and 46.7\% in technique detection, with Audio-MAE as the overall most effective encoder. Finally, we conducted a case study on Chopin Piano Competition data using trained models for expertise ranking, which highlights the challenge of accurately assessing top-tier performances.
title	From Audio Encoders to Piano Judges: Benchmarking Performance Understanding for Solo Piano
topic	Audio and Speech Processing
url	https://arxiv.org/abs/2407.04518

Similar Items