Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Devulapally, Naresh Kumar, Anand, Sidharth, Bhattacharjee, Sreyasee Das, Yuan, Junsong
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2402.10921
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917591494164480
author	Devulapally, Naresh Kumar Anand, Sidharth Bhattacharjee, Sreyasee Das Yuan, Junsong
author_facet	Devulapally, Naresh Kumar Anand, Sidharth Bhattacharjee, Sreyasee Das Yuan, Junsong
contents	Human emotion can be presented in different modes i.e., audio, video, and text. However, the contribution of each mode in exhibiting each emotion is not uniform. Furthermore, the availability of complete mode-specific details may not always be guaranteed in the test time. In this work, we propose AM^2-EmoJE, a model for Adaptive Missing-Modality Emotion Recognition in Conversation via Joint Embedding Learning model that is grounded on two-fold contributions: First, a query adaptive fusion that can automatically learn the relative importance of its mode-specific representations in a query-specific manner. By this the model aims to prioritize the mode-invariant spatial query details of the emotion patterns, while also retaining its mode-exclusive aspects within the learned multimodal query descriptor. Second the multimodal joint embedding learning module that explicitly addresses various missing modality scenarios in test-time. By this, the model learns to emphasize on the correlated patterns across modalities, which may help align the cross-attended mode-specific descriptors pairwise within a joint-embedding space and thereby compensate for missing modalities during inference. By leveraging the spatio-temporal details at the dialogue level, the proposed AM^2-EmoJE not only demonstrates superior performance compared to the best-performing state-of-the-art multimodal methods, by effectively leveraging body language in place of face expression, it also exhibits an enhanced privacy feature. By reporting around 2-5% improvement in the weighted-F1 score, the proposed multimodal joint embedding module facilitates an impressive performance gain in a variety of missing-modality query scenarios during test time.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_10921
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	AM^2-EmoJE: Adaptive Missing-Modality Emotion Recognition in Conversation via Joint Embedding Learning Devulapally, Naresh Kumar Anand, Sidharth Bhattacharjee, Sreyasee Das Yuan, Junsong Artificial Intelligence Computer Vision and Pattern Recognition Machine Learning Human emotion can be presented in different modes i.e., audio, video, and text. However, the contribution of each mode in exhibiting each emotion is not uniform. Furthermore, the availability of complete mode-specific details may not always be guaranteed in the test time. In this work, we propose AM^2-EmoJE, a model for Adaptive Missing-Modality Emotion Recognition in Conversation via Joint Embedding Learning model that is grounded on two-fold contributions: First, a query adaptive fusion that can automatically learn the relative importance of its mode-specific representations in a query-specific manner. By this the model aims to prioritize the mode-invariant spatial query details of the emotion patterns, while also retaining its mode-exclusive aspects within the learned multimodal query descriptor. Second the multimodal joint embedding learning module that explicitly addresses various missing modality scenarios in test-time. By this, the model learns to emphasize on the correlated patterns across modalities, which may help align the cross-attended mode-specific descriptors pairwise within a joint-embedding space and thereby compensate for missing modalities during inference. By leveraging the spatio-temporal details at the dialogue level, the proposed AM^2-EmoJE not only demonstrates superior performance compared to the best-performing state-of-the-art multimodal methods, by effectively leveraging body language in place of face expression, it also exhibits an enhanced privacy feature. By reporting around 2-5% improvement in the weighted-F1 score, the proposed multimodal joint embedding module facilitates an impressive performance gain in a variety of missing-modality query scenarios during test time.
title	AM^2-EmoJE: Adaptive Missing-Modality Emotion Recognition in Conversation via Joint Embedding Learning
topic	Artificial Intelligence Computer Vision and Pattern Recognition Machine Learning
url	https://arxiv.org/abs/2402.10921

Similar Items