Enregistré dans:
Détails bibliographiques
Auteurs principaux: Luo, Jiaqi, Yuan, Yuan, Xu, Shixin
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2506.00813
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866910979649961984
author Luo, Jiaqi
Yuan, Yuan
Xu, Shixin
author_facet Luo, Jiaqi
Yuan, Yuan
Xu, Shixin
contents Tabular-image multimodal learning, which integrates structured tabular data with imaging data, holds great promise for a variety of tasks, especially in medical applications. Yet, two key challenges remain: (1) the lack of a standardized, pretrained representation for tabular data, as is commonly available in vision and language domains; and (2) the difficulty of handling missing values in the tabular modality, which are common in real-world medical datasets. To address these issues, we propose the TabPFN-Integrated Multimodal Engine (TIME), a novel multimodal framework that builds on the recently introduced tabular foundation model, TabPFN. TIME leverages TabPFN as a frozen tabular encoder to generate robust, strong embeddings that are naturally resilient to missing data, and combines them with image features from pretrained vision backbones. We explore a range of fusion strategies and tabular encoders, and evaluate our approach on both natural and medical datasets. Extensive experiments demonstrate that TIME consistently outperforms competitive baselines across both complete and incomplete tabular inputs, underscoring its practical value in real-world multimodal learning scenarios.
format Preprint
id arxiv_https___arxiv_org_abs_2506_00813
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle TIME: TabPFN-Integrated Multimodal Engine for Robust Tabular-Image Learning
Luo, Jiaqi
Yuan, Yuan
Xu, Shixin
Computer Vision and Pattern Recognition
Machine Learning
Tabular-image multimodal learning, which integrates structured tabular data with imaging data, holds great promise for a variety of tasks, especially in medical applications. Yet, two key challenges remain: (1) the lack of a standardized, pretrained representation for tabular data, as is commonly available in vision and language domains; and (2) the difficulty of handling missing values in the tabular modality, which are common in real-world medical datasets. To address these issues, we propose the TabPFN-Integrated Multimodal Engine (TIME), a novel multimodal framework that builds on the recently introduced tabular foundation model, TabPFN. TIME leverages TabPFN as a frozen tabular encoder to generate robust, strong embeddings that are naturally resilient to missing data, and combines them with image features from pretrained vision backbones. We explore a range of fusion strategies and tabular encoders, and evaluate our approach on both natural and medical datasets. Extensive experiments demonstrate that TIME consistently outperforms competitive baselines across both complete and incomplete tabular inputs, underscoring its practical value in real-world multimodal learning scenarios.
title TIME: TabPFN-Integrated Multimodal Engine for Robust Tabular-Image Learning
topic Computer Vision and Pattern Recognition
Machine Learning
url https://arxiv.org/abs/2506.00813