Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Marmoret, Axel, Farrugia, Nicolas, Stupacher, Jan Alexander
Format:	Preprint
Published:	2026
Subjects:	Sound Artificial Intelligence Machine Learning Audio and Speech Processing H.5.5
Online Access:	https://arxiv.org/abs/2603.27237
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914429269966848
author	Marmoret, Axel Farrugia, Nicolas Stupacher, Jan Alexander
author_facet	Marmoret, Axel Farrugia, Nicolas Stupacher, Jan Alexander
contents	This study explores the extent to which deep learning models can predict groove and its related perceptual dimensions directly from audio signals. We critically examine the effectiveness of seven state-of-the-art deep learning models in predicting groove ratings and responses to groove-related queries through the extraction of audio embeddings. Additionally, we compare these predictions with traditional handcrafted audio features. To better understand the underlying mechanics, we extend this methodology to analyze predictions based on source-separated instruments, thereby isolating the contributions of individual musical elements. Our analysis reveals a clear separation of groove characteristics driven by the underlying musical style of the tracks (funk, pop, and rock). These findings indicate that deep audio representations can successfully encode complex, style-dependent groove components that traditional features often miss. Ultimately, this work highlights the capacity of advanced deep learning models to capture the multifaceted concept of groove, demonstrating the strong potential of representation learning to advance predictive Music Information Retrieval methodologies.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_27237
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Can pre-trained Deep Learning models predict groove ratings? Marmoret, Axel Farrugia, Nicolas Stupacher, Jan Alexander Sound Artificial Intelligence Machine Learning Audio and Speech Processing H.5.5 This study explores the extent to which deep learning models can predict groove and its related perceptual dimensions directly from audio signals. We critically examine the effectiveness of seven state-of-the-art deep learning models in predicting groove ratings and responses to groove-related queries through the extraction of audio embeddings. Additionally, we compare these predictions with traditional handcrafted audio features. To better understand the underlying mechanics, we extend this methodology to analyze predictions based on source-separated instruments, thereby isolating the contributions of individual musical elements. Our analysis reveals a clear separation of groove characteristics driven by the underlying musical style of the tracks (funk, pop, and rock). These findings indicate that deep audio representations can successfully encode complex, style-dependent groove components that traditional features often miss. Ultimately, this work highlights the capacity of advanced deep learning models to capture the multifaceted concept of groove, demonstrating the strong potential of representation learning to advance predictive Music Information Retrieval methodologies.
title	Can pre-trained Deep Learning models predict groove ratings?
topic	Sound Artificial Intelligence Machine Learning Audio and Speech Processing H.5.5
url	https://arxiv.org/abs/2603.27237

Similar Items