Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sui, Xiangjie, Li, Songyang, Zhu, Hanwei, Chen, Baoliang, Fang, Yuming, Sun, Xin
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.19032
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917100917882880
author	Sui, Xiangjie Li, Songyang Zhu, Hanwei Chen, Baoliang Fang, Yuming Sun, Xin
author_facet	Sui, Xiangjie Li, Songyang Zhu, Hanwei Chen, Baoliang Fang, Yuming Sun, Xin
contents	Despite the remarkable reasoning abilities of large vision-language models (LVLMs), their robustness under visual corruptions remains insufficiently studied. Existing evaluation paradigms exhibit two major limitations: 1) the dominance of low-discriminative samples in current datasets masks the real robustness gap between models; and 2) conventional accuracy-based metric fail to capture the degradation of the underlying prediction structure. To bridge these gaps, we introduce Bench-C, a comprehensive benchmark emphasizing discriminative samples for assessing corruption robustness, where a selection strategy is proposed to jointly consider the prediction inconsistency under corruption and the semantic diversity. Furthermore, we propose the Robustness Alignment Score (RAS), a unified metric that measures degradation in logit-level prediction structure by considering the shifts in prediction uncertainty and calibration alignment. Comprehensive experiments and analysis reveal several interesting findings: 1) model behaviors exhibit distinguish patterns under corruptions, such as erroneous confidence and hesitation; 2) despite subtle corruption may lead to a slight accuracy gain, the overall prediction structure still degrades; 3) by decomposing corruption robustness into destructive and corrective components, the distinct failure and recovery patterns across models can be revealed.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_19032
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Benchmarking Corruption Robustness of LVLMs: A Discriminative Benchmark and Robustness Alignment Metric Sui, Xiangjie Li, Songyang Zhu, Hanwei Chen, Baoliang Fang, Yuming Sun, Xin Computer Vision and Pattern Recognition Despite the remarkable reasoning abilities of large vision-language models (LVLMs), their robustness under visual corruptions remains insufficiently studied. Existing evaluation paradigms exhibit two major limitations: 1) the dominance of low-discriminative samples in current datasets masks the real robustness gap between models; and 2) conventional accuracy-based metric fail to capture the degradation of the underlying prediction structure. To bridge these gaps, we introduce Bench-C, a comprehensive benchmark emphasizing discriminative samples for assessing corruption robustness, where a selection strategy is proposed to jointly consider the prediction inconsistency under corruption and the semantic diversity. Furthermore, we propose the Robustness Alignment Score (RAS), a unified metric that measures degradation in logit-level prediction structure by considering the shifts in prediction uncertainty and calibration alignment. Comprehensive experiments and analysis reveal several interesting findings: 1) model behaviors exhibit distinguish patterns under corruptions, such as erroneous confidence and hesitation; 2) despite subtle corruption may lead to a slight accuracy gain, the overall prediction structure still degrades; 3) by decomposing corruption robustness into destructive and corrective components, the distinct failure and recovery patterns across models can be revealed.
title	Benchmarking Corruption Robustness of LVLMs: A Discriminative Benchmark and Robustness Alignment Metric
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2511.19032

Similar Items