Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kim, Daewoong, Dong, Hao-Wen, Jeong, Dasaem
Format:	Preprint
Published:	2024
Subjects:	Sound Artificial Intelligence Machine Learning Audio and Speech Processing Signal Processing
Online Access:	https://arxiv.org/abs/2409.12477
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915136743145472
author	Kim, Daewoong Dong, Hao-Wen Jeong, Dasaem
author_facet	Kim, Daewoong Dong, Hao-Wen Jeong, Dasaem
contents	Modeling the natural contour of fundamental frequency (F0) plays a critical role in music audio synthesis. However, transcribing and managing multiple F0 contours in polyphonic music is challenging, and explicit F0 contour modeling has not yet been explored for polyphonic instrumental synthesis. In this paper, we present ViolinDiff, a two-stage diffusion-based synthesis framework. For a given violin MIDI file, the first stage estimates the F0 contour as pitch bend information, and the second stage generates mel spectrogram incorporating these expressive details. The quantitative metrics and listening test results show that the proposed model generates more realistic violin sounds than the model without explicit pitch bend modeling. Audio samples are available online: daewoung.github.io/ViolinDiff-Demo.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_12477
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend Conditioning Kim, Daewoong Dong, Hao-Wen Jeong, Dasaem Sound Artificial Intelligence Machine Learning Audio and Speech Processing Signal Processing Modeling the natural contour of fundamental frequency (F0) plays a critical role in music audio synthesis. However, transcribing and managing multiple F0 contours in polyphonic music is challenging, and explicit F0 contour modeling has not yet been explored for polyphonic instrumental synthesis. In this paper, we present ViolinDiff, a two-stage diffusion-based synthesis framework. For a given violin MIDI file, the first stage estimates the F0 contour as pitch bend information, and the second stage generates mel spectrogram incorporating these expressive details. The quantitative metrics and listening test results show that the proposed model generates more realistic violin sounds than the model without explicit pitch bend modeling. Audio samples are available online: daewoung.github.io/ViolinDiff-Demo.
title	ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend Conditioning
topic	Sound Artificial Intelligence Machine Learning Audio and Speech Processing Signal Processing
url	https://arxiv.org/abs/2409.12477

Similar Items