Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kuşçu, Gökhan, Erzin, Engin
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Human-Computer Interaction
Online Access:	https://arxiv.org/abs/2406.02569
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911904686931968
author	Kuşçu, Gökhan Erzin, Engin
author_facet	Kuşçu, Gökhan Erzin, Engin
contents	Continuous emotion recognition (CER) aims to track the dynamic changes in a person's emotional state over time. This paper proposes a novel approach to translating CER into a prediction problem of dynamic affect-contour clusters from speech, where the affect-contour is defined as the contour of annotated affect attributes in a temporal window. Our approach defines a cluster-to-predict (C2P) framework that learns affect-contour clusters, which are predicted from speech with higher precision. To achieve this, C2P runs an unsupervised iterative optimization process to learn affect-contour clusters by minimizing both clustering loss and speech-driven affect-contour prediction loss. Our objective findings demonstrate the value of speech-driven clustering for both arousal and valence attributes. Experiments conducted on the RECOLA dataset yielded promising classification results, with F1 scores of 0.84 for arousal and 0.75 for valence in our four-class speech-driven affect-contour prediction model.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_02569
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Cluster-to-Predict Affect Contours from Speech Kuşçu, Gökhan Erzin, Engin Audio and Speech Processing Human-Computer Interaction Continuous emotion recognition (CER) aims to track the dynamic changes in a person's emotional state over time. This paper proposes a novel approach to translating CER into a prediction problem of dynamic affect-contour clusters from speech, where the affect-contour is defined as the contour of annotated affect attributes in a temporal window. Our approach defines a cluster-to-predict (C2P) framework that learns affect-contour clusters, which are predicted from speech with higher precision. To achieve this, C2P runs an unsupervised iterative optimization process to learn affect-contour clusters by minimizing both clustering loss and speech-driven affect-contour prediction loss. Our objective findings demonstrate the value of speech-driven clustering for both arousal and valence attributes. Experiments conducted on the RECOLA dataset yielded promising classification results, with F1 scores of 0.84 for arousal and 0.75 for valence in our four-class speech-driven affect-contour prediction model.
title	Cluster-to-Predict Affect Contours from Speech
topic	Audio and Speech Processing Human-Computer Interaction
url	https://arxiv.org/abs/2406.02569

Similar Items