Saved in:
Bibliographic Details
Main Authors: Grzeczkowicz, Rémi, Soriano, Eric, Janati, Ali, Zhang, Miyu, Comas-Quiles, Gerard, Araruna, Victor Carballo, Jonelagadda, Aneesh
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.09121
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910017413709824
author Grzeczkowicz, Rémi
Soriano, Eric
Janati, Ali
Zhang, Miyu
Comas-Quiles, Gerard
Araruna, Victor Carballo
Jonelagadda, Aneesh
author_facet Grzeczkowicz, Rémi
Soriano, Eric
Janati, Ali
Zhang, Miyu
Comas-Quiles, Gerard
Araruna, Victor Carballo
Jonelagadda, Aneesh
contents In this work, we present a lightweight and privacy-preserving Multimodal Emotion Recognition (MER) framework designed for deployment on edge devices. To demonstrate framework's versatility, our implementation uses three modalities - speech, text and facial imagery. However, the system is fully modular, and can be extended to support other modalities or tasks. Each modality is processed through a dedicated backbone optimized for inference efficiency: Emotion2Vec for speech, a ResNet-based model for facial expressions, and DistilRoBERTa for text. To reconcile uncertainty across modalities, we introduce a model- and task-agnostic fusion mechanism grounded in Dempster-Shafer theory and Dirichlet evidence. Operating directly on model logits, this approach captures predictive uncertainty without requiring additional training or joint distribution estimation, making it broadly applicable beyond emotion recognition. Validation on five benchmark datasets (eNTERFACE05, MEAD, MELD, RAVDESS and CREMA-D) show that our method achieves competitive accuracy while remaining computationally efficient and robust to ambiguous or missing inputs. Overall, the proposed framework emphasizes modularity, scalability, and real-world feasibility, paving the way toward uncertainty-aware multimodal systems for healthcare, human-computer interaction, and other emotion-informed applications.
format Preprint
id arxiv_https___arxiv_org_abs_2602_09121
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Uncertainty-Aware Multimodal Emotion Recognition through Dirichlet Parameterization
Grzeczkowicz, Rémi
Soriano, Eric
Janati, Ali
Zhang, Miyu
Comas-Quiles, Gerard
Araruna, Victor Carballo
Jonelagadda, Aneesh
Artificial Intelligence
In this work, we present a lightweight and privacy-preserving Multimodal Emotion Recognition (MER) framework designed for deployment on edge devices. To demonstrate framework's versatility, our implementation uses three modalities - speech, text and facial imagery. However, the system is fully modular, and can be extended to support other modalities or tasks. Each modality is processed through a dedicated backbone optimized for inference efficiency: Emotion2Vec for speech, a ResNet-based model for facial expressions, and DistilRoBERTa for text. To reconcile uncertainty across modalities, we introduce a model- and task-agnostic fusion mechanism grounded in Dempster-Shafer theory and Dirichlet evidence. Operating directly on model logits, this approach captures predictive uncertainty without requiring additional training or joint distribution estimation, making it broadly applicable beyond emotion recognition. Validation on five benchmark datasets (eNTERFACE05, MEAD, MELD, RAVDESS and CREMA-D) show that our method achieves competitive accuracy while remaining computationally efficient and robust to ambiguous or missing inputs. Overall, the proposed framework emphasizes modularity, scalability, and real-world feasibility, paving the way toward uncertainty-aware multimodal systems for healthcare, human-computer interaction, and other emotion-informed applications.
title Uncertainty-Aware Multimodal Emotion Recognition through Dirichlet Parameterization
topic Artificial Intelligence
url https://arxiv.org/abs/2602.09121