Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Xingxuan, Li, Jiansheng, Chu, Wenjing, Hai, Junjia, Xu, Renzhe, Yang, Yuqing, Guan, Shikai, Xu, Jiazheng, Cui, Peng
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2402.06599
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911774248271872
author	Zhang, Xingxuan Li, Jiansheng Chu, Wenjing Hai, Junjia Xu, Renzhe Yang, Yuqing Guan, Shikai Xu, Jiazheng Cui, Peng
author_facet	Zhang, Xingxuan Li, Jiansheng Chu, Wenjing Hai, Junjia Xu, Renzhe Yang, Yuqing Guan, Shikai Xu, Jiazheng Cui, Peng
contents	We investigate the generalization boundaries of current Multimodal Large Language Models (MLLMs) via comprehensive evaluation under out-of-distribution scenarios and domain-specific tasks. We evaluate their zero-shot generalization across synthetic images, real-world distributional shifts, and specialized datasets like medical and molecular imagery. Empirical results indicate that MLLMs struggle with generalization beyond common training domains, limiting their direct application without adaptation. To understand the cause of unreliable performance, we analyze three hypotheses: semantic misinterpretation, visual feature extraction insufficiency, and mapping deficiency. Results identify mapping deficiency as the primary hurdle. To address this problem, we show that in-context learning (ICL) can significantly enhance MLLMs' generalization, opening new avenues for overcoming generalization barriers. We further explore the robustness of ICL under distribution shifts and show its vulnerability to domain shifts, label shifts, and spurious correlation shifts between in-context examples and test data.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_06599
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	On the Out-Of-Distribution Generalization of Multimodal Large Language Models Zhang, Xingxuan Li, Jiansheng Chu, Wenjing Hai, Junjia Xu, Renzhe Yang, Yuqing Guan, Shikai Xu, Jiazheng Cui, Peng Computer Vision and Pattern Recognition Artificial Intelligence We investigate the generalization boundaries of current Multimodal Large Language Models (MLLMs) via comprehensive evaluation under out-of-distribution scenarios and domain-specific tasks. We evaluate their zero-shot generalization across synthetic images, real-world distributional shifts, and specialized datasets like medical and molecular imagery. Empirical results indicate that MLLMs struggle with generalization beyond common training domains, limiting their direct application without adaptation. To understand the cause of unreliable performance, we analyze three hypotheses: semantic misinterpretation, visual feature extraction insufficiency, and mapping deficiency. Results identify mapping deficiency as the primary hurdle. To address this problem, we show that in-context learning (ICL) can significantly enhance MLLMs' generalization, opening new avenues for overcoming generalization barriers. We further explore the robustness of ICL under distribution shifts and show its vulnerability to domain shifts, label shifts, and spurious correlation shifts between in-context examples and test data.
title	On the Out-Of-Distribution Generalization of Multimodal Large Language Models
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2402.06599

Similar Items