Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Gui, Zhong, Zehao, Zhou, YongSong, Li, Yudong, Wu, Ende, Cheah, Wooi Ping, Qu, Rong, Ren, Jianfeng, Shen, Linlin
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.20350
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Despite significant progress in Multi-modal Large Language Models (MLLMs), their clinical reasoning capacity for multi-modal diagnosis remains largely unexamined. Current benchmarks, mostly single-modality data, can't evaluate progressive reasoning and cross-modal integration essential for clinical practice. We introduce the Cross-Modality Progressive Clinical Reasoning (X-PCR) benchmark, the first comprehensive evaluation of MLLMs through a complete ophthalmology diagnostic workflow, with two reasoning tasks: 1) a six-stage progressive reasoning chain spanning image quality assessment to clinical decision-making, and 2) a cross-modality reasoning task integrating six imaging modalities. The benchmark comprises 26,415 images and 177,868 expert-verified VQA pairs curated from 51 public datasets, covering 52 ophthalmic diseases. Evaluation of 21 MLLMs reveals critical gaps in progressive reasoning and cross-modal integration. Dataset and code: https://github.com/CVI-SZU/X-PCR.

Similar Items