MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Yang, Chen, Lin, Guanxin, He, Youquan, Chen, Peiyao, Liu, Guanghe, Mo, Yufan, Xu, Zhouyuan, Wang, Linhao, Zhang, Guohui, Zhang, Zihang, Zeng, Shenxiang, Wang, Chen, Fan, Jiansheng
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Computer Vision and Pattern Recognition
Accesso online:	https://arxiv.org/abs/2602.07864
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866910271107235840
author	Yang, Chen Lin, Guanxin He, Youquan Chen, Peiyao Liu, Guanghe Mo, Yufan Xu, Zhouyuan Wang, Linhao Zhang, Guohui Zhang, Zihang Zeng, Shenxiang Wang, Chen Fan, Jiansheng
author_facet	Yang, Chen Lin, Guanxin He, Youquan Chen, Peiyao Liu, Guanghe Mo, Yufan Xu, Zhouyuan Wang, Linhao Zhang, Guohui Zhang, Zihang Zeng, Shenxiang Wang, Chen Fan, Jiansheng
contents	Spatial intelligence is crucial for vision--language models (VLMs), yet many scene-centric benchmarks evaluate unconstrained environments where a single image may admit multiple plausible 3D interpretations. We introduce SSI-Bench, a VQA benchmark for Structure-Centric Spatial Reasoning (SCSR) in constraint-governed spaces. Built from complex real-world 3D structures, it uses structural constraints from geometry, topology, and physical feasibility to make component relations more determinate from visual evidence. The benchmark contains 1,000 ranking questions spanning geometric and topological reasoning, where correct ordering requires resolving all candidate-wise 3D relations, imposing stronger demands on spatial understanding. It is created through a fully human-centered pipeline with over 400 researcher-hours of image curation, component annotation, and question design. Evaluating 31 VLMs reveals a large gap to humans: the best open-source model achieves 22.2% accuracy and the strongest closed-source model reaches 33.6%, while humans score 91.6%. Further results show that chain-of-thought reasoning brings only marginal gains, and error analysis reveals fundamental limitations in current models' spatial understanding within constraint-governed spaces. Project page: https://ssi-bench.github.io.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_07864
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Thinking in Structures: Evaluating Spatial Intelligence in Constraint-Governed Spaces Yang, Chen Lin, Guanxin He, Youquan Chen, Peiyao Liu, Guanghe Mo, Yufan Xu, Zhouyuan Wang, Linhao Zhang, Guohui Zhang, Zihang Zeng, Shenxiang Wang, Chen Fan, Jiansheng Computer Vision and Pattern Recognition Spatial intelligence is crucial for vision--language models (VLMs), yet many scene-centric benchmarks evaluate unconstrained environments where a single image may admit multiple plausible 3D interpretations. We introduce SSI-Bench, a VQA benchmark for Structure-Centric Spatial Reasoning (SCSR) in constraint-governed spaces. Built from complex real-world 3D structures, it uses structural constraints from geometry, topology, and physical feasibility to make component relations more determinate from visual evidence. The benchmark contains 1,000 ranking questions spanning geometric and topological reasoning, where correct ordering requires resolving all candidate-wise 3D relations, imposing stronger demands on spatial understanding. It is created through a fully human-centered pipeline with over 400 researcher-hours of image curation, component annotation, and question design. Evaluating 31 VLMs reveals a large gap to humans: the best open-source model achieves 22.2% accuracy and the strongest closed-source model reaches 33.6%, while humans score 91.6%. Further results show that chain-of-thought reasoning brings only marginal gains, and error analysis reveals fundamental limitations in current models' spatial understanding within constraint-governed spaces. Project page: https://ssi-bench.github.io.
title	Thinking in Structures: Evaluating Spatial Intelligence in Constraint-Governed Spaces
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.07864

Documenti analoghi