Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Liu, Yunyi, Jin, Craig
Format:	Preprint
Publié:	2024
Sujets:	Sound Audio and Speech Processing
Accès en ligne:	https://arxiv.org/abs/2406.07131
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866914831634792448
author	Liu, Yunyi Jin, Craig
author_facet	Liu, Yunyi Jin, Craig
contents	Neural audio synthesis methods can achieve high-fidelity and realistic sound generation by utilizing deep generative models. Such models typically rely on external labels which are often discrete as conditioning information to achieve guided sound generation. However, it remains difficult to control the subtle changes in sounds without appropriate and descriptive labels, especially given a limited dataset. This paper proposes an implicit conditioning method for neural audio synthesis using generative adversarial networks that allows for interpretable control of the acoustic features of synthesized sounds. Our technique creates a continuous conditioning space that enables timbre manipulation without relying on explicit labels. We further introduce an evaluation metric to explore controllability and demonstrate that our approach is effective in enabling a degree of controlled variation of different synthesized sound effects for in-domain and cross-domain sounds.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_07131
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	ICGAN: An implicit conditioning method for interpretable feature control of neural audio synthesis Liu, Yunyi Jin, Craig Sound Audio and Speech Processing Neural audio synthesis methods can achieve high-fidelity and realistic sound generation by utilizing deep generative models. Such models typically rely on external labels which are often discrete as conditioning information to achieve guided sound generation. However, it remains difficult to control the subtle changes in sounds without appropriate and descriptive labels, especially given a limited dataset. This paper proposes an implicit conditioning method for neural audio synthesis using generative adversarial networks that allows for interpretable control of the acoustic features of synthesized sounds. Our technique creates a continuous conditioning space that enables timbre manipulation without relying on explicit labels. We further introduce an evaluation metric to explore controllability and demonstrate that our approach is effective in enabling a degree of controlled variation of different synthesized sound effects for in-domain and cross-domain sounds.
title	ICGAN: An implicit conditioning method for interpretable feature control of neural audio synthesis
topic	Sound Audio and Speech Processing
url	https://arxiv.org/abs/2406.07131

Documents similaires