Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Vasilakis, Yannis, Bittner, Rachel, Pauwels, Johan
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence Information Retrieval Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2409.11449
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910609551917056
author	Vasilakis, Yannis Bittner, Rachel Pauwels, Johan
author_facet	Vasilakis, Yannis Bittner, Rachel Pauwels, Johan
contents	Music-text multimodal systems have enabled new approaches to Music Information Research (MIR) applications such as audio-to-text and text-to-audio retrieval, text-based song generation, and music captioning. Despite the reported success, little effort has been put into evaluating the musical knowledge of Large Language Models (LLM). In this paper, we demonstrate that LLMs suffer from 1) prompt sensitivity, 2) inability to model negation (e.g. 'rock song without guitar'), and 3) sensitivity towards the presence of specific words. We quantified these properties as a triplet-based accuracy, evaluating the ability to model the relative similarity of labels in a hierarchical ontology. We leveraged the Audioset ontology to generate triplets consisting of an anchor, a positive (relevant) label, and a negative (less relevant) label for the genre and instruments sub-tree. We evaluated the triplet-based musical knowledge for six general-purpose Transformer-based models. The triplets obtained through this methodology required filtering, as some were difficult to judge and therefore relatively uninformative for evaluation purposes. Despite the relatively high accuracy reported, inconsistencies are evident in all six models, suggesting that off-the-shelf LLMs need adaptation to music before use.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_11449
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Evaluation of pretrained language models on music understanding Vasilakis, Yannis Bittner, Rachel Pauwels, Johan Machine Learning Artificial Intelligence Information Retrieval Sound Audio and Speech Processing Music-text multimodal systems have enabled new approaches to Music Information Research (MIR) applications such as audio-to-text and text-to-audio retrieval, text-based song generation, and music captioning. Despite the reported success, little effort has been put into evaluating the musical knowledge of Large Language Models (LLM). In this paper, we demonstrate that LLMs suffer from 1) prompt sensitivity, 2) inability to model negation (e.g. 'rock song without guitar'), and 3) sensitivity towards the presence of specific words. We quantified these properties as a triplet-based accuracy, evaluating the ability to model the relative similarity of labels in a hierarchical ontology. We leveraged the Audioset ontology to generate triplets consisting of an anchor, a positive (relevant) label, and a negative (less relevant) label for the genre and instruments sub-tree. We evaluated the triplet-based musical knowledge for six general-purpose Transformer-based models. The triplets obtained through this methodology required filtering, as some were difficult to judge and therefore relatively uninformative for evaluation purposes. Despite the relatively high accuracy reported, inconsistencies are evident in all six models, suggesting that off-the-shelf LLMs need adaptation to music before use.
title	Evaluation of pretrained language models on music understanding
topic	Machine Learning Artificial Intelligence Information Retrieval Sound Audio and Speech Processing
url	https://arxiv.org/abs/2409.11449

Similar Items