Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lu, Shuo, Cheng, Jianjie, Xu, Yinuo, Yu, Yongcan, Sheng, Lijun, Wang, Peijie, Jiang, Siru, Hu, Yongguan, Ling, Run, Shao, Yihua, Ma, Ao, Feng, Wei, He, Lingxiao, Wang, Meng, Xie, Qianlong, Wang, Xingxing, Sebe, Nicu, He, Ran, Liang, Jian
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.11635
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908947104923648
author	Lu, Shuo Cheng, Jianjie Xu, Yinuo Yu, Yongcan Sheng, Lijun Wang, Peijie Jiang, Siru Hu, Yongguan Ling, Run Shao, Yihua Ma, Ao Feng, Wei He, Lingxiao Wang, Meng Xie, Qianlong Wang, Xingxing Sebe, Nicu He, Ran Liang, Jian
author_facet	Lu, Shuo Cheng, Jianjie Xu, Yinuo Yu, Yongcan Sheng, Lijun Wang, Peijie Jiang, Siru Hu, Yongguan Ling, Run Shao, Yihua Ma, Ao Feng, Wei He, Lingxiao Wang, Meng Xie, Qianlong Wang, Xingxing Sebe, Nicu He, Ran Liang, Jian
contents	Multimodal large language models (MLLMs) have achieved strong performance on perception-oriented tasks, yet their ability to perform mathematical spatial reasoning, defined as the capacity to parse and manipulate two- and three-dimensional relations, remains unclear. Humans easily solve textbook-style spatial reasoning problems with over 95\% accuracy, but we find that most leading MLLMs fail to reach even 60\% on the same tasks. This striking gap highlights spatial reasoning as a fundamental weakness of current models. To investigate this gap, we present \emph{MathSpatial}, the first large-scale and systematic dataset resource dedicated to mathematical spatial reasoning in MLLMs. \emph{MathSpatial} provides two complementary subsets: (i)~\emph{MathSpatial-Bench}, a rigorously curated evaluation set of 2{,}000 problems spanning 3 categories and 11 subtypes, designed to isolate spatial reasoning from perceptual noise; and (ii)~\emph{MathSpatial-Corpus}, a training set of 8{,}000 problems equipped with verified solutions and structured reasoning traces. All problems are sourced from authentic educational materials and undergo multi-stage quality control including deduplication, geometric consistency checking, and cross-validated solution verification. Benchmarking 16 leading MLLMs on \emph{MathSpatial-Bench} reveals that spatial reasoning remains a fundamental bottleneck: even GPT-5 lags behind human performance by over 35 percentage points, with particularly poor results on abstract deduction tasks. We further show that training on \emph{MathSpatial-Corpus} yields consistent improvements across model families, demonstrating the dataset's practical value for advancing spatial reasoning capabilities. \emph{MathSpatial} is publicly available at https://shuolucs.github.io/MathSpatial.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_11635
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Do MLLMs Really Understand Space? A Mathematical Reasoning Evaluation Lu, Shuo Cheng, Jianjie Xu, Yinuo Yu, Yongcan Sheng, Lijun Wang, Peijie Jiang, Siru Hu, Yongguan Ling, Run Shao, Yihua Ma, Ao Feng, Wei He, Lingxiao Wang, Meng Xie, Qianlong Wang, Xingxing Sebe, Nicu He, Ran Liang, Jian Artificial Intelligence Multimodal large language models (MLLMs) have achieved strong performance on perception-oriented tasks, yet their ability to perform mathematical spatial reasoning, defined as the capacity to parse and manipulate two- and three-dimensional relations, remains unclear. Humans easily solve textbook-style spatial reasoning problems with over 95\% accuracy, but we find that most leading MLLMs fail to reach even 60\% on the same tasks. This striking gap highlights spatial reasoning as a fundamental weakness of current models. To investigate this gap, we present \emph{MathSpatial}, the first large-scale and systematic dataset resource dedicated to mathematical spatial reasoning in MLLMs. \emph{MathSpatial} provides two complementary subsets: (i)~\emph{MathSpatial-Bench}, a rigorously curated evaluation set of 2{,}000 problems spanning 3 categories and 11 subtypes, designed to isolate spatial reasoning from perceptual noise; and (ii)~\emph{MathSpatial-Corpus}, a training set of 8{,}000 problems equipped with verified solutions and structured reasoning traces. All problems are sourced from authentic educational materials and undergo multi-stage quality control including deduplication, geometric consistency checking, and cross-validated solution verification. Benchmarking 16 leading MLLMs on \emph{MathSpatial-Bench} reveals that spatial reasoning remains a fundamental bottleneck: even GPT-5 lags behind human performance by over 35 percentage points, with particularly poor results on abstract deduction tasks. We further show that training on \emph{MathSpatial-Corpus} yields consistent improvements across model families, demonstrating the dataset's practical value for advancing spatial reasoning capabilities. \emph{MathSpatial} is publicly available at https://shuolucs.github.io/MathSpatial.
title	Do MLLMs Really Understand Space? A Mathematical Reasoning Evaluation
topic	Artificial Intelligence
url	https://arxiv.org/abs/2602.11635

Similar Items