Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Huan, Chowdhury, Shreyan, Cancino-Chacón, Carlos Eduardo, Liang, Jinhua, Dixon, Simon, Widmer, Gerhard
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2406.14850
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910497087946752
author	Zhang, Huan Chowdhury, Shreyan Cancino-Chacón, Carlos Eduardo Liang, Jinhua Dixon, Simon Widmer, Gerhard
author_facet	Zhang, Huan Chowdhury, Shreyan Cancino-Chacón, Carlos Eduardo Liang, Jinhua Dixon, Simon Widmer, Gerhard
contents	In the pursuit of developing expressive music performance models using artificial intelligence, this paper introduces DExter, a new approach leveraging diffusion probabilistic models to render Western classical piano performances. In this approach, performance parameters are represented in a continuous expression space and a diffusion model is trained to predict these continuous parameters while being conditioned on the musical score. Furthermore, DExter also enables the generation of interpretations (expressive variations of a performance) guided by perceptually meaningful features by conditioning jointly on score and perceptual feature representations. Consequently, we find that our model is useful for learning expressive performance, generating perceptually steered performances, and transferring performance styles. We assess the model through quantitative and qualitative analyses, focusing on specific performance metrics regarding dimensions like asynchrony and articulation, as well as through listening tests comparing generated performances with different human interpretations. Results show that DExter is able to capture the time-varying correlation of the expressive parameters, and compares well to existing rendering models in subjectively evaluated ratings. The perceptual-feature-conditioned generation and transferring capabilities of DExter are verified by a proxy model predicting perceptual characteristics of differently steered performances.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_14850
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	DExter: Learning and Controlling Performance Expression with Diffusion Models Zhang, Huan Chowdhury, Shreyan Cancino-Chacón, Carlos Eduardo Liang, Jinhua Dixon, Simon Widmer, Gerhard Audio and Speech Processing In the pursuit of developing expressive music performance models using artificial intelligence, this paper introduces DExter, a new approach leveraging diffusion probabilistic models to render Western classical piano performances. In this approach, performance parameters are represented in a continuous expression space and a diffusion model is trained to predict these continuous parameters while being conditioned on the musical score. Furthermore, DExter also enables the generation of interpretations (expressive variations of a performance) guided by perceptually meaningful features by conditioning jointly on score and perceptual feature representations. Consequently, we find that our model is useful for learning expressive performance, generating perceptually steered performances, and transferring performance styles. We assess the model through quantitative and qualitative analyses, focusing on specific performance metrics regarding dimensions like asynchrony and articulation, as well as through listening tests comparing generated performances with different human interpretations. Results show that DExter is able to capture the time-varying correlation of the expressive parameters, and compares well to existing rendering models in subjectively evaluated ratings. The perceptual-feature-conditioned generation and transferring capabilities of DExter are verified by a proxy model predicting perceptual characteristics of differently steered performances.
title	DExter: Learning and Controlling Performance Expression with Diffusion Models
topic	Audio and Speech Processing
url	https://arxiv.org/abs/2406.14850

Similar Items