Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Grzeczkowicz, Rémi, Soriano, Eric, Janati, Ali, Zhang, Miyu, Comas-Quiles, Gerard, Araruna, Victor Carballo, Jonelagadda, Aneesh
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.09121
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910017413709824
author	Grzeczkowicz, Rémi Soriano, Eric Janati, Ali Zhang, Miyu Comas-Quiles, Gerard Araruna, Victor Carballo Jonelagadda, Aneesh
author_facet	Grzeczkowicz, Rémi Soriano, Eric Janati, Ali Zhang, Miyu Comas-Quiles, Gerard Araruna, Victor Carballo Jonelagadda, Aneesh
contents	In this work, we present a lightweight and privacy-preserving Multimodal Emotion Recognition (MER) framework designed for deployment on edge devices. To demonstrate framework's versatility, our implementation uses three modalities - speech, text and facial imagery. However, the system is fully modular, and can be extended to support other modalities or tasks. Each modality is processed through a dedicated backbone optimized for inference efficiency: Emotion2Vec for speech, a ResNet-based model for facial expressions, and DistilRoBERTa for text. To reconcile uncertainty across modalities, we introduce a model- and task-agnostic fusion mechanism grounded in Dempster-Shafer theory and Dirichlet evidence. Operating directly on model logits, this approach captures predictive uncertainty without requiring additional training or joint distribution estimation, making it broadly applicable beyond emotion recognition. Validation on five benchmark datasets (eNTERFACE05, MEAD, MELD, RAVDESS and CREMA-D) show that our method achieves competitive accuracy while remaining computationally efficient and robust to ambiguous or missing inputs. Overall, the proposed framework emphasizes modularity, scalability, and real-world feasibility, paving the way toward uncertainty-aware multimodal systems for healthcare, human-computer interaction, and other emotion-informed applications.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_09121
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Uncertainty-Aware Multimodal Emotion Recognition through Dirichlet Parameterization Grzeczkowicz, Rémi Soriano, Eric Janati, Ali Zhang, Miyu Comas-Quiles, Gerard Araruna, Victor Carballo Jonelagadda, Aneesh Artificial Intelligence In this work, we present a lightweight and privacy-preserving Multimodal Emotion Recognition (MER) framework designed for deployment on edge devices. To demonstrate framework's versatility, our implementation uses three modalities - speech, text and facial imagery. However, the system is fully modular, and can be extended to support other modalities or tasks. Each modality is processed through a dedicated backbone optimized for inference efficiency: Emotion2Vec for speech, a ResNet-based model for facial expressions, and DistilRoBERTa for text. To reconcile uncertainty across modalities, we introduce a model- and task-agnostic fusion mechanism grounded in Dempster-Shafer theory and Dirichlet evidence. Operating directly on model logits, this approach captures predictive uncertainty without requiring additional training or joint distribution estimation, making it broadly applicable beyond emotion recognition. Validation on five benchmark datasets (eNTERFACE05, MEAD, MELD, RAVDESS and CREMA-D) show that our method achieves competitive accuracy while remaining computationally efficient and robust to ambiguous or missing inputs. Overall, the proposed framework emphasizes modularity, scalability, and real-world feasibility, paving the way toward uncertainty-aware multimodal systems for healthcare, human-computer interaction, and other emotion-informed applications.
title	Uncertainty-Aware Multimodal Emotion Recognition through Dirichlet Parameterization
topic	Artificial Intelligence
url	https://arxiv.org/abs/2602.09121

Similar Items