Saved in:
Bibliographic Details
Main Authors: Sui, Xiangjie, Li, Songyang, Zhu, Hanwei, Chen, Baoliang, Fang, Yuming, Sun, Xin
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.19032
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917100917882880
author Sui, Xiangjie
Li, Songyang
Zhu, Hanwei
Chen, Baoliang
Fang, Yuming
Sun, Xin
author_facet Sui, Xiangjie
Li, Songyang
Zhu, Hanwei
Chen, Baoliang
Fang, Yuming
Sun, Xin
contents Despite the remarkable reasoning abilities of large vision-language models (LVLMs), their robustness under visual corruptions remains insufficiently studied. Existing evaluation paradigms exhibit two major limitations: 1) the dominance of low-discriminative samples in current datasets masks the real robustness gap between models; and 2) conventional accuracy-based metric fail to capture the degradation of the underlying prediction structure. To bridge these gaps, we introduce Bench-C, a comprehensive benchmark emphasizing discriminative samples for assessing corruption robustness, where a selection strategy is proposed to jointly consider the prediction inconsistency under corruption and the semantic diversity. Furthermore, we propose the Robustness Alignment Score (RAS), a unified metric that measures degradation in logit-level prediction structure by considering the shifts in prediction uncertainty and calibration alignment. Comprehensive experiments and analysis reveal several interesting findings: 1) model behaviors exhibit distinguish patterns under corruptions, such as erroneous confidence and hesitation; 2) despite subtle corruption may lead to a slight accuracy gain, the overall prediction structure still degrades; 3) by decomposing corruption robustness into destructive and corrective components, the distinct failure and recovery patterns across models can be revealed.
format Preprint
id arxiv_https___arxiv_org_abs_2511_19032
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Benchmarking Corruption Robustness of LVLMs: A Discriminative Benchmark and Robustness Alignment Metric
Sui, Xiangjie
Li, Songyang
Zhu, Hanwei
Chen, Baoliang
Fang, Yuming
Sun, Xin
Computer Vision and Pattern Recognition
Despite the remarkable reasoning abilities of large vision-language models (LVLMs), their robustness under visual corruptions remains insufficiently studied. Existing evaluation paradigms exhibit two major limitations: 1) the dominance of low-discriminative samples in current datasets masks the real robustness gap between models; and 2) conventional accuracy-based metric fail to capture the degradation of the underlying prediction structure. To bridge these gaps, we introduce Bench-C, a comprehensive benchmark emphasizing discriminative samples for assessing corruption robustness, where a selection strategy is proposed to jointly consider the prediction inconsistency under corruption and the semantic diversity. Furthermore, we propose the Robustness Alignment Score (RAS), a unified metric that measures degradation in logit-level prediction structure by considering the shifts in prediction uncertainty and calibration alignment. Comprehensive experiments and analysis reveal several interesting findings: 1) model behaviors exhibit distinguish patterns under corruptions, such as erroneous confidence and hesitation; 2) despite subtle corruption may lead to a slight accuracy gain, the overall prediction structure still degrades; 3) by decomposing corruption robustness into destructive and corrective components, the distinct failure and recovery patterns across models can be revealed.
title Benchmarking Corruption Robustness of LVLMs: A Discriminative Benchmark and Robustness Alignment Metric
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2511.19032