Saved in:
Bibliographic Details
Main Authors: Nguyen, Duy A., Do, Quan Huu, Doan, Khoa D., Do, Minh N.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.13465
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912333971849216
author Nguyen, Duy A.
Do, Quan Huu
Doan, Khoa D.
Do, Minh N.
author_facet Nguyen, Duy A.
Do, Quan Huu
Doan, Khoa D.
Do, Minh N.
contents Multimodal learning has demonstrated incredible successes by integrating diverse data sources, yet it often relies on the availability of all modalities - an assumption that rarely holds in real-world applications. Pretrained multimodal models, while effective, struggle when confronted with small-scale and incomplete datasets (i.e., missing modalities), limiting their practical applicability. Previous studies on reconstructing missing modalities have overlooked the reconstruction's potential unreliability, which could compromise the quality of the final outputs. We present SURE (Scalable Uncertainty and Reconstruction Estimation), a novel framework that extends the capabilities of pretrained multimodal models by introducing latent space reconstruction and uncertainty estimation for both reconstructed modalities and downstream tasks. Our method is architecture-agnostic, reconstructs missing modalities, and delivers reliable uncertainty estimates, improving both interpretability and performance. SURE introduces a unique Pearson Correlation-based loss and applies statistical error propagation in deep networks for the first time, allowing precise quantification of uncertainties from missing data and model predictions. Extensive experiments across tasks such as sentiment analysis, genre classification, and action recognition show that SURE consistently achieves state-of-the-art performance, ensuring robust predictions even in the presence of incomplete data.
format Preprint
id arxiv_https___arxiv_org_abs_2504_13465
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Are you SURE? Enhancing Multimodal Pretraining with Missing Modalities through Uncertainty Estimation
Nguyen, Duy A.
Do, Quan Huu
Doan, Khoa D.
Do, Minh N.
Machine Learning
Multimodal learning has demonstrated incredible successes by integrating diverse data sources, yet it often relies on the availability of all modalities - an assumption that rarely holds in real-world applications. Pretrained multimodal models, while effective, struggle when confronted with small-scale and incomplete datasets (i.e., missing modalities), limiting their practical applicability. Previous studies on reconstructing missing modalities have overlooked the reconstruction's potential unreliability, which could compromise the quality of the final outputs. We present SURE (Scalable Uncertainty and Reconstruction Estimation), a novel framework that extends the capabilities of pretrained multimodal models by introducing latent space reconstruction and uncertainty estimation for both reconstructed modalities and downstream tasks. Our method is architecture-agnostic, reconstructs missing modalities, and delivers reliable uncertainty estimates, improving both interpretability and performance. SURE introduces a unique Pearson Correlation-based loss and applies statistical error propagation in deep networks for the first time, allowing precise quantification of uncertainties from missing data and model predictions. Extensive experiments across tasks such as sentiment analysis, genre classification, and action recognition show that SURE consistently achieves state-of-the-art performance, ensuring robust predictions even in the presence of incomplete data.
title Are you SURE? Enhancing Multimodal Pretraining with Missing Modalities through Uncertainty Estimation
topic Machine Learning
url https://arxiv.org/abs/2504.13465