Saved in:
| Main Authors: | Weng, Tengjin, Wang, Jingyi, Jiang, Wenhao, Ming, Zhong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.14939 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models
by: Weng, Tengjin, et al.
Published: (2026)
by: Weng, Tengjin, et al.
Published: (2026)
LlamaSeg: Image Segmentation via Autoregressive Mask Generation
by: Deng, Jiru, et al.
Published: (2025)
by: Deng, Jiru, et al.
Published: (2025)
Accurate Segmentation of Optic Disc And Cup from Multiple Pseudo-labels by Noise-aware Learning
by: Weng, Tengjin, et al.
Published: (2023)
by: Weng, Tengjin, et al.
Published: (2023)
A Survey on Evaluation of Multimodal Large Language Models
by: Huang, Jiaxing, et al.
Published: (2024)
by: Huang, Jiaxing, et al.
Published: (2024)
VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models
by: Sima, Bingrui, et al.
Published: (2025)
by: Sima, Bingrui, et al.
Published: (2025)
VisRes Bench: On Evaluating the Visual Reasoning Capabilities of VLMs
by: Törtei, Brigitta Malagurski, et al.
Published: (2025)
by: Törtei, Brigitta Malagurski, et al.
Published: (2025)
SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection
by: Qi, Tianye, et al.
Published: (2025)
by: Qi, Tianye, et al.
Published: (2025)
EmoTrans: A Benchmark for Understanding, Reasoning, and Predicting Emotion Transitions in Multimodal LLMs
by: Hu, He, et al.
Published: (2026)
by: Hu, He, et al.
Published: (2026)
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
by: Liu, Xin, et al.
Published: (2023)
by: Liu, Xin, et al.
Published: (2023)
M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models
by: Weng, Ju-Hsuan, et al.
Published: (2025)
by: Weng, Ju-Hsuan, et al.
Published: (2025)
NavBench: Probing Multimodal Large Language Models for Embodied Navigation
by: Qiao, Yanyuan, et al.
Published: (2025)
by: Qiao, Yanyuan, et al.
Published: (2025)
KidVis: Do Multimodal Large Language Models Possess the Visual Perceptual Capabilities of a 6-Year-Old?
by: Wang, Xianfeng, et al.
Published: (2026)
by: Wang, Xianfeng, et al.
Published: (2026)
ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models
by: De Min, Thomas, et al.
Published: (2026)
by: De Min, Thomas, et al.
Published: (2026)
LithoBench: Benchmarking Large Multimodal Models for Remote-Sensing Lithology Interpretation
by: Wang, Jun, et al.
Published: (2026)
by: Wang, Jun, et al.
Published: (2026)
HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing
by: Zhang, Xinyu, et al.
Published: (2026)
by: Zhang, Xinyu, et al.
Published: (2026)
VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents
by: Zhang, Zhengbo, et al.
Published: (2026)
by: Zhang, Zhengbo, et al.
Published: (2026)
PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
by: Chen, Liang, et al.
Published: (2024)
by: Chen, Liang, et al.
Published: (2024)
MIBench: Evaluating Multimodal Large Language Models over Multiple Images
by: Liu, Haowei, et al.
Published: (2024)
by: Liu, Haowei, et al.
Published: (2024)
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
by: Ying, Kaining, et al.
Published: (2024)
by: Ying, Kaining, et al.
Published: (2024)
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
by: Zhou, Pengfei, et al.
Published: (2025)
by: Zhou, Pengfei, et al.
Published: (2025)
ENC-Bench: A Benchmark for Evaluating Multimodal Large Language Models in Electronic Navigational Chart Understanding
by: Cheng, Ao, et al.
Published: (2026)
by: Cheng, Ao, et al.
Published: (2026)
SenseBench: A Benchmark for Remote Sensing Low-Level Visual Perception and Description in Large Vision-Language Models
by: Zhong, Chen, et al.
Published: (2026)
by: Zhong, Chen, et al.
Published: (2026)
RotBench: Evaluating Multimodal Large Language Models on Identifying Image Rotation
by: Niu, Tianyi, et al.
Published: (2025)
by: Niu, Tianyi, et al.
Published: (2025)
VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding
by: Chen, Jian, et al.
Published: (2025)
by: Chen, Jian, et al.
Published: (2025)
BLEnD-Vis: Benchmarking Multimodal Cultural Understanding in Vision Language Models
by: Tan, Bryan Chen Zhengyu, et al.
Published: (2025)
by: Tan, Bryan Chen Zhengyu, et al.
Published: (2025)
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
by: Wang, Shengkang, et al.
Published: (2024)
by: Wang, Shengkang, et al.
Published: (2024)
Med-RewardBench: Benchmarking Reward Models and Judges for Medical Multimodal Large Language Models
by: Ding, Meidan, et al.
Published: (2025)
by: Ding, Meidan, et al.
Published: (2025)
Res-Bench: Benchmarking the Robustness of Multimodal Large Language Models to Dynamic Resolution Input
by: Li, Chenxu, et al.
Published: (2025)
by: Li, Chenxu, et al.
Published: (2025)
II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
by: Liu, Ziqiang, et al.
Published: (2024)
by: Liu, Ziqiang, et al.
Published: (2024)
EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark
by: Li, Ming, et al.
Published: (2024)
by: Li, Ming, et al.
Published: (2024)
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models
by: Roberts, Jonathan, et al.
Published: (2025)
by: Roberts, Jonathan, et al.
Published: (2025)
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
by: Wu, Yuhang, et al.
Published: (2024)
by: Wu, Yuhang, et al.
Published: (2024)
VisJudge-Bench: Aesthetics and Quality Assessment of Visualizations
by: Xie, Yupeng, et al.
Published: (2025)
by: Xie, Yupeng, et al.
Published: (2025)
ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction
by: Guo, Zichun, et al.
Published: (2026)
by: Guo, Zichun, et al.
Published: (2026)
VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models
by: Xu, Mingjie, et al.
Published: (2025)
by: Xu, Mingjie, et al.
Published: (2025)
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
by: Chen, Jiacheng, et al.
Published: (2024)
by: Chen, Jiacheng, et al.
Published: (2024)
SAP-Bench: Benchmarking Multimodal Large Language Models in Surgical Action Planning
by: Xu, Mengya, et al.
Published: (2025)
by: Xu, Mengya, et al.
Published: (2025)
Survey of Adversarial Robustness in Multimodal Large Language Models
by: Jiang, Chengze, et al.
Published: (2025)
by: Jiang, Chengze, et al.
Published: (2025)
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
by: Wang, Fengxiang, et al.
Published: (2025)
by: Wang, Fengxiang, et al.
Published: (2025)
SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models
by: Wang, Hanqing, et al.
Published: (2025)
by: Wang, Hanqing, et al.
Published: (2025)
Similar Items
-
OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models
by: Weng, Tengjin, et al.
Published: (2026) -
LlamaSeg: Image Segmentation via Autoregressive Mask Generation
by: Deng, Jiru, et al.
Published: (2025) -
Accurate Segmentation of Optic Disc And Cup from Multiple Pseudo-labels by Noise-aware Learning
by: Weng, Tengjin, et al.
Published: (2023) -
A Survey on Evaluation of Multimodal Large Language Models
by: Huang, Jiaxing, et al.
Published: (2024) -
VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models
by: Sima, Bingrui, et al.
Published: (2025)