Saved in:
Bibliographic Details
Main Authors: Zhang, Xingxuan, Li, Jiansheng, Chu, Wenjing, Hai, Junjia, Xu, Renzhe, Yang, Yuqing, Guan, Shikai, Xu, Jiazheng, Cui, Peng
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.06599
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911774248271872
author Zhang, Xingxuan
Li, Jiansheng
Chu, Wenjing
Hai, Junjia
Xu, Renzhe
Yang, Yuqing
Guan, Shikai
Xu, Jiazheng
Cui, Peng
author_facet Zhang, Xingxuan
Li, Jiansheng
Chu, Wenjing
Hai, Junjia
Xu, Renzhe
Yang, Yuqing
Guan, Shikai
Xu, Jiazheng
Cui, Peng
contents We investigate the generalization boundaries of current Multimodal Large Language Models (MLLMs) via comprehensive evaluation under out-of-distribution scenarios and domain-specific tasks. We evaluate their zero-shot generalization across synthetic images, real-world distributional shifts, and specialized datasets like medical and molecular imagery. Empirical results indicate that MLLMs struggle with generalization beyond common training domains, limiting their direct application without adaptation. To understand the cause of unreliable performance, we analyze three hypotheses: semantic misinterpretation, visual feature extraction insufficiency, and mapping deficiency. Results identify mapping deficiency as the primary hurdle. To address this problem, we show that in-context learning (ICL) can significantly enhance MLLMs' generalization, opening new avenues for overcoming generalization barriers. We further explore the robustness of ICL under distribution shifts and show its vulnerability to domain shifts, label shifts, and spurious correlation shifts between in-context examples and test data.
format Preprint
id arxiv_https___arxiv_org_abs_2402_06599
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle On the Out-Of-Distribution Generalization of Multimodal Large Language Models
Zhang, Xingxuan
Li, Jiansheng
Chu, Wenjing
Hai, Junjia
Xu, Renzhe
Yang, Yuqing
Guan, Shikai
Xu, Jiazheng
Cui, Peng
Computer Vision and Pattern Recognition
Artificial Intelligence
We investigate the generalization boundaries of current Multimodal Large Language Models (MLLMs) via comprehensive evaluation under out-of-distribution scenarios and domain-specific tasks. We evaluate their zero-shot generalization across synthetic images, real-world distributional shifts, and specialized datasets like medical and molecular imagery. Empirical results indicate that MLLMs struggle with generalization beyond common training domains, limiting their direct application without adaptation. To understand the cause of unreliable performance, we analyze three hypotheses: semantic misinterpretation, visual feature extraction insufficiency, and mapping deficiency. Results identify mapping deficiency as the primary hurdle. To address this problem, we show that in-context learning (ICL) can significantly enhance MLLMs' generalization, opening new avenues for overcoming generalization barriers. We further explore the robustness of ICL under distribution shifts and show its vulnerability to domain shifts, label shifts, and spurious correlation shifts between in-context examples and test data.
title On the Out-Of-Distribution Generalization of Multimodal Large Language Models
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2402.06599