Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wei, Xuefeng, Wang, Zhixuan, Zhou, Xuan, Qu, Zhi, Li, Hongyao, Sakai, Yusuke, Kamigaito, Hidetaka, Watanabe, Taro
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2604.11632
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911715108585472
author	Wei, Xuefeng Wang, Zhixuan Zhou, Xuan Qu, Zhi Li, Hongyao Sakai, Yusuke Kamigaito, Hidetaka Watanabe, Taro
author_facet	Wei, Xuefeng Wang, Zhixuan Zhou, Xuan Qu, Zhi Li, Hongyao Sakai, Yusuke Kamigaito, Hidetaka Watanabe, Taro
contents	We introduce CARTBENCH, a museum-grounded benchmark for evaluating vision-language models (VLMs) on Chinese artworks beyond short-form recognition and QA. CARTBENCH comprises four subtasks: CURATORQA for evidence-grounded recognition and reasoning, CATALOGCAPTION for structured four-section expert-style appreciation, REINTERPRET for defensible reinterpretation with expert ratings, and CONNOISSEURPAIRS for diagnostic authenticity discrimination under visually similar confounds. CARTBENCH is built by aligning image-bearing Palace Museum objects from Wikidata with authoritative catalog pages, spanning five art categories across multiple dynasties. Across nine representative VLMs, we find that high overall CURATORQA accuracy can mask sharp drops on hard evidence linking and style-to-period inference; long-form appreciation remains far from expert references; and authenticity-oriented diagnostic discrimination stays near chance, underscoring the difficulty of connoisseur-level reasoning for current models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_11632
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	CArtBench: Evaluating Vision-Language Models on Chinese Art Understanding, Interpretation, and Authenticity Wei, Xuefeng Wang, Zhixuan Zhou, Xuan Qu, Zhi Li, Hongyao Sakai, Yusuke Kamigaito, Hidetaka Watanabe, Taro Computation and Language We introduce CARTBENCH, a museum-grounded benchmark for evaluating vision-language models (VLMs) on Chinese artworks beyond short-form recognition and QA. CARTBENCH comprises four subtasks: CURATORQA for evidence-grounded recognition and reasoning, CATALOGCAPTION for structured four-section expert-style appreciation, REINTERPRET for defensible reinterpretation with expert ratings, and CONNOISSEURPAIRS for diagnostic authenticity discrimination under visually similar confounds. CARTBENCH is built by aligning image-bearing Palace Museum objects from Wikidata with authoritative catalog pages, spanning five art categories across multiple dynasties. Across nine representative VLMs, we find that high overall CURATORQA accuracy can mask sharp drops on hard evidence linking and style-to-period inference; long-form appreciation remains far from expert references; and authenticity-oriented diagnostic discrimination stays near chance, underscoring the difficulty of connoisseur-level reasoning for current models.
title	CArtBench: Evaluating Vision-Language Models on Chinese Art Understanding, Interpretation, and Authenticity
topic	Computation and Language
url	https://arxiv.org/abs/2604.11632

Similar Items