Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Luo, Jiaqi, Yuan, Yuan, Xu, Shixin
Format:	Preprint
Publié:	2025
Sujets:	Computer Vision and Pattern Recognition Machine Learning
Accès en ligne:	https://arxiv.org/abs/2506.00813
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866910979649961984
author	Luo, Jiaqi Yuan, Yuan Xu, Shixin
author_facet	Luo, Jiaqi Yuan, Yuan Xu, Shixin
contents	Tabular-image multimodal learning, which integrates structured tabular data with imaging data, holds great promise for a variety of tasks, especially in medical applications. Yet, two key challenges remain: (1) the lack of a standardized, pretrained representation for tabular data, as is commonly available in vision and language domains; and (2) the difficulty of handling missing values in the tabular modality, which are common in real-world medical datasets. To address these issues, we propose the TabPFN-Integrated Multimodal Engine (TIME), a novel multimodal framework that builds on the recently introduced tabular foundation model, TabPFN. TIME leverages TabPFN as a frozen tabular encoder to generate robust, strong embeddings that are naturally resilient to missing data, and combines them with image features from pretrained vision backbones. We explore a range of fusion strategies and tabular encoders, and evaluate our approach on both natural and medical datasets. Extensive experiments demonstrate that TIME consistently outperforms competitive baselines across both complete and incomplete tabular inputs, underscoring its practical value in real-world multimodal learning scenarios.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_00813
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	TIME: TabPFN-Integrated Multimodal Engine for Robust Tabular-Image Learning Luo, Jiaqi Yuan, Yuan Xu, Shixin Computer Vision and Pattern Recognition Machine Learning Tabular-image multimodal learning, which integrates structured tabular data with imaging data, holds great promise for a variety of tasks, especially in medical applications. Yet, two key challenges remain: (1) the lack of a standardized, pretrained representation for tabular data, as is commonly available in vision and language domains; and (2) the difficulty of handling missing values in the tabular modality, which are common in real-world medical datasets. To address these issues, we propose the TabPFN-Integrated Multimodal Engine (TIME), a novel multimodal framework that builds on the recently introduced tabular foundation model, TabPFN. TIME leverages TabPFN as a frozen tabular encoder to generate robust, strong embeddings that are naturally resilient to missing data, and combines them with image features from pretrained vision backbones. We explore a range of fusion strategies and tabular encoders, and evaluate our approach on both natural and medical datasets. Extensive experiments demonstrate that TIME consistently outperforms competitive baselines across both complete and incomplete tabular inputs, underscoring its practical value in real-world multimodal learning scenarios.
title	TIME: TabPFN-Integrated Multimodal Engine for Robust Tabular-Image Learning
topic	Computer Vision and Pattern Recognition Machine Learning
url	https://arxiv.org/abs/2506.00813

Documents similaires