Saved in:
Bibliographic Details
Main Authors: Wu, Wen, Zhang, Chao, Wu, Xixin, Woodland, Philip C.
Format: Preprint
Published: 2022
Subjects:
Online Access:https://arxiv.org/abs/2203.04443
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929296629563392
author Wu, Wen
Zhang, Chao
Wu, Xixin
Woodland, Philip C.
author_facet Wu, Wen
Zhang, Chao
Wu, Xixin
Woodland, Philip C.
contents Emotion recognition is a key attribute for artificial intelligence systems that need to naturally interact with humans. However, the task definition is still an open problem due to the inherent ambiguity of emotions. In this paper, a novel Bayesian training loss based on per-utterance Dirichlet prior distributions is proposed for verbal emotion recognition, which models the uncertainty in one-hot labels created when human annotators assign the same utterance to different emotion classes. An additional metric is used to evaluate the performance by detection test utterances with high labelling uncertainty. This removes a major limitation that emotion classification systems only consider utterances with labels where the majority of annotators agree on the emotion class. Furthermore, a frequentist approach is studied to leverage the continuous-valued "soft" labels obtained by averaging the one-hot labels. We propose a two-branch model structure for emotion classification on a per-utterance basis, which achieves state-of-the-art classification results on the widely used IEMOCAP dataset. Based on this, uncertainty estimation experiments were performed. The best performance in terms of the area under the precision-recall curve when detecting utterances with high uncertainty was achieved by interpolating the Bayesian training loss with the Kullback-Leibler divergence training loss for the soft labels. The generality of the proposed approach was verified using the MSP-Podcast dataset which yielded the same pattern of results.
format Preprint
id arxiv_https___arxiv_org_abs_2203_04443
institution arXiv
publishDate 2022
record_format arxiv
spellingShingle Estimating the Uncertainty in Emotion Class Labels with Utterance-Specific Dirichlet Priors
Wu, Wen
Zhang, Chao
Wu, Xixin
Woodland, Philip C.
Computation and Language
Emotion recognition is a key attribute for artificial intelligence systems that need to naturally interact with humans. However, the task definition is still an open problem due to the inherent ambiguity of emotions. In this paper, a novel Bayesian training loss based on per-utterance Dirichlet prior distributions is proposed for verbal emotion recognition, which models the uncertainty in one-hot labels created when human annotators assign the same utterance to different emotion classes. An additional metric is used to evaluate the performance by detection test utterances with high labelling uncertainty. This removes a major limitation that emotion classification systems only consider utterances with labels where the majority of annotators agree on the emotion class. Furthermore, a frequentist approach is studied to leverage the continuous-valued "soft" labels obtained by averaging the one-hot labels. We propose a two-branch model structure for emotion classification on a per-utterance basis, which achieves state-of-the-art classification results on the widely used IEMOCAP dataset. Based on this, uncertainty estimation experiments were performed. The best performance in terms of the area under the precision-recall curve when detecting utterances with high uncertainty was achieved by interpolating the Bayesian training loss with the Kullback-Leibler divergence training loss for the soft labels. The generality of the proposed approach was verified using the MSP-Podcast dataset which yielded the same pattern of results.
title Estimating the Uncertainty in Emotion Class Labels with Utterance-Specific Dirichlet Priors
topic Computation and Language
url https://arxiv.org/abs/2203.04443