Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.11635 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866908947104923648 |
|---|---|
| author | Lu, Shuo Cheng, Jianjie Xu, Yinuo Yu, Yongcan Sheng, Lijun Wang, Peijie Jiang, Siru Hu, Yongguan Ling, Run Shao, Yihua Ma, Ao Feng, Wei He, Lingxiao Wang, Meng Xie, Qianlong Wang, Xingxing Sebe, Nicu He, Ran Liang, Jian |
| author_facet | Lu, Shuo Cheng, Jianjie Xu, Yinuo Yu, Yongcan Sheng, Lijun Wang, Peijie Jiang, Siru Hu, Yongguan Ling, Run Shao, Yihua Ma, Ao Feng, Wei He, Lingxiao Wang, Meng Xie, Qianlong Wang, Xingxing Sebe, Nicu He, Ran Liang, Jian |
| contents | Multimodal large language models (MLLMs) have achieved strong performance on perception-oriented tasks, yet their ability to perform mathematical spatial reasoning, defined as the capacity to parse and manipulate two- and three-dimensional relations, remains unclear. Humans easily solve textbook-style spatial reasoning problems with over 95\% accuracy, but we find that most leading MLLMs fail to reach even 60\% on the same tasks. This striking gap highlights spatial reasoning as a fundamental weakness of current models. To investigate this gap, we present \emph{MathSpatial}, the first large-scale and systematic dataset resource dedicated to mathematical spatial reasoning in MLLMs. \emph{MathSpatial} provides two complementary subsets: (i)~\emph{MathSpatial-Bench}, a rigorously curated evaluation set of 2{,}000 problems spanning 3 categories and 11 subtypes, designed to isolate spatial reasoning from perceptual noise; and (ii)~\emph{MathSpatial-Corpus}, a training set of 8{,}000 problems equipped with verified solutions and structured reasoning traces. All problems are sourced from authentic educational materials and undergo multi-stage quality control including deduplication, geometric consistency checking, and cross-validated solution verification. Benchmarking 16 leading MLLMs on \emph{MathSpatial-Bench} reveals that spatial reasoning remains a fundamental bottleneck: even GPT-5 lags behind human performance by over 35 percentage points, with particularly poor results on abstract deduction tasks. We further show that training on \emph{MathSpatial-Corpus} yields consistent improvements across model families, demonstrating the dataset's practical value for advancing spatial reasoning capabilities. \emph{MathSpatial} is publicly available at https://shuolucs.github.io/MathSpatial. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2602_11635 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Do MLLMs Really Understand Space? A Mathematical Reasoning Evaluation Lu, Shuo Cheng, Jianjie Xu, Yinuo Yu, Yongcan Sheng, Lijun Wang, Peijie Jiang, Siru Hu, Yongguan Ling, Run Shao, Yihua Ma, Ao Feng, Wei He, Lingxiao Wang, Meng Xie, Qianlong Wang, Xingxing Sebe, Nicu He, Ran Liang, Jian Artificial Intelligence Multimodal large language models (MLLMs) have achieved strong performance on perception-oriented tasks, yet their ability to perform mathematical spatial reasoning, defined as the capacity to parse and manipulate two- and three-dimensional relations, remains unclear. Humans easily solve textbook-style spatial reasoning problems with over 95\% accuracy, but we find that most leading MLLMs fail to reach even 60\% on the same tasks. This striking gap highlights spatial reasoning as a fundamental weakness of current models. To investigate this gap, we present \emph{MathSpatial}, the first large-scale and systematic dataset resource dedicated to mathematical spatial reasoning in MLLMs. \emph{MathSpatial} provides two complementary subsets: (i)~\emph{MathSpatial-Bench}, a rigorously curated evaluation set of 2{,}000 problems spanning 3 categories and 11 subtypes, designed to isolate spatial reasoning from perceptual noise; and (ii)~\emph{MathSpatial-Corpus}, a training set of 8{,}000 problems equipped with verified solutions and structured reasoning traces. All problems are sourced from authentic educational materials and undergo multi-stage quality control including deduplication, geometric consistency checking, and cross-validated solution verification. Benchmarking 16 leading MLLMs on \emph{MathSpatial-Bench} reveals that spatial reasoning remains a fundamental bottleneck: even GPT-5 lags behind human performance by over 35 percentage points, with particularly poor results on abstract deduction tasks. We further show that training on \emph{MathSpatial-Corpus} yields consistent improvements across model families, demonstrating the dataset's practical value for advancing spatial reasoning capabilities. \emph{MathSpatial} is publicly available at https://shuolucs.github.io/MathSpatial. |
| title | Do MLLMs Really Understand Space? A Mathematical Reasoning Evaluation |
| topic | Artificial Intelligence |
| url | https://arxiv.org/abs/2602.11635 |