Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Gui, Wennuo, Yang, Ma, Xusen, Zhong, Zehao, Wu, Zhuoru, Wu, Ende, Qu, Rong, Cheah, Wooi Ping, Ren, Jianfeng, Shen, Linlin
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2509.15596
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915530338729984
author	Wang, Gui Wennuo, Yang Ma, Xusen Zhong, Zehao Wu, Zhuoru Wu, Ende Qu, Rong Cheah, Wooi Ping Ren, Jianfeng Shen, Linlin
author_facet	Wang, Gui Wennuo, Yang Ma, Xusen Zhong, Zehao Wu, Zhuoru Wu, Ende Qu, Rong Cheah, Wooi Ping Ren, Jianfeng Shen, Linlin
contents	MLLMs (Multimodal Large Language Models) have showcased remarkable capabilities, but their performance in high-stakes, domain-specific scenarios like surgical settings, remains largely under-explored. To address this gap, we develop \textbf{EyePCR}, a large-scale benchmark for ophthalmic surgery analysis, grounded in structured clinical knowledge to evaluate cognition across \textit{Perception}, \textit{Comprehension} and \textit{Reasoning}. EyePCR offers a richly annotated corpus with more than 210k VQAs, which cover 1048 fine-grained attributes for multi-view perception, medical knowledge graph of more than 25k triplets for comprehension, and four clinically grounded reasoning tasks. The rich annotations facilitate in-depth cognitive analysis, simulating how surgeons perceive visual cues and combine them with domain knowledge to make decisions, thus greatly improving models' cognitive ability. In particular, \textbf{EyePCR-MLLM}, a domain-adapted variant of Qwen2.5-VL-7B, achieves the highest accuracy on MCQs for \textit{Perception} among compared models and outperforms open-source models in \textit{Comprehension} and \textit{Reasoning}, rivalling commercial models like GPT-4.1. EyePCR reveals the limitations of existing MLLMs in surgical cognition and lays the foundation for benchmarking and enhancing clinical reliability of surgical video understanding models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_15596
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	EyePCR: A Comprehensive Benchmark for Fine-Grained Perception, Knowledge Comprehension and Clinical Reasoning in Ophthalmic Surgery Wang, Gui Wennuo, Yang Ma, Xusen Zhong, Zehao Wu, Zhuoru Wu, Ende Qu, Rong Cheah, Wooi Ping Ren, Jianfeng Shen, Linlin Computer Vision and Pattern Recognition MLLMs (Multimodal Large Language Models) have showcased remarkable capabilities, but their performance in high-stakes, domain-specific scenarios like surgical settings, remains largely under-explored. To address this gap, we develop \textbf{EyePCR}, a large-scale benchmark for ophthalmic surgery analysis, grounded in structured clinical knowledge to evaluate cognition across \textit{Perception}, \textit{Comprehension} and \textit{Reasoning}. EyePCR offers a richly annotated corpus with more than 210k VQAs, which cover 1048 fine-grained attributes for multi-view perception, medical knowledge graph of more than 25k triplets for comprehension, and four clinically grounded reasoning tasks. The rich annotations facilitate in-depth cognitive analysis, simulating how surgeons perceive visual cues and combine them with domain knowledge to make decisions, thus greatly improving models' cognitive ability. In particular, \textbf{EyePCR-MLLM}, a domain-adapted variant of Qwen2.5-VL-7B, achieves the highest accuracy on MCQs for \textit{Perception} among compared models and outperforms open-source models in \textit{Comprehension} and \textit{Reasoning}, rivalling commercial models like GPT-4.1. EyePCR reveals the limitations of existing MLLMs in surgical cognition and lays the foundation for benchmarking and enhancing clinical reliability of surgical video understanding models.
title	EyePCR: A Comprehensive Benchmark for Fine-Grained Perception, Knowledge Comprehension and Clinical Reasoning in Ophthalmic Surgery
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2509.15596

Similar Items