Saved in:
Bibliographic Details
Main Authors: Lu, Shuo, Cheng, Jianjie, Xu, Yinuo, Yu, Yongcan, Sheng, Lijun, Wang, Peijie, Jiang, Siru, Hu, Yongguan, Ling, Run, Shao, Yihua, Ma, Ao, Feng, Wei, He, Lingxiao, Wang, Meng, Xie, Qianlong, Wang, Xingxing, Sebe, Nicu, He, Ran, Liang, Jian
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.11635
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908947104923648
author Lu, Shuo
Cheng, Jianjie
Xu, Yinuo
Yu, Yongcan
Sheng, Lijun
Wang, Peijie
Jiang, Siru
Hu, Yongguan
Ling, Run
Shao, Yihua
Ma, Ao
Feng, Wei
He, Lingxiao
Wang, Meng
Xie, Qianlong
Wang, Xingxing
Sebe, Nicu
He, Ran
Liang, Jian
author_facet Lu, Shuo
Cheng, Jianjie
Xu, Yinuo
Yu, Yongcan
Sheng, Lijun
Wang, Peijie
Jiang, Siru
Hu, Yongguan
Ling, Run
Shao, Yihua
Ma, Ao
Feng, Wei
He, Lingxiao
Wang, Meng
Xie, Qianlong
Wang, Xingxing
Sebe, Nicu
He, Ran
Liang, Jian
contents Multimodal large language models (MLLMs) have achieved strong performance on perception-oriented tasks, yet their ability to perform mathematical spatial reasoning, defined as the capacity to parse and manipulate two- and three-dimensional relations, remains unclear. Humans easily solve textbook-style spatial reasoning problems with over 95\% accuracy, but we find that most leading MLLMs fail to reach even 60\% on the same tasks. This striking gap highlights spatial reasoning as a fundamental weakness of current models. To investigate this gap, we present \emph{MathSpatial}, the first large-scale and systematic dataset resource dedicated to mathematical spatial reasoning in MLLMs. \emph{MathSpatial} provides two complementary subsets: (i)~\emph{MathSpatial-Bench}, a rigorously curated evaluation set of 2{,}000 problems spanning 3 categories and 11 subtypes, designed to isolate spatial reasoning from perceptual noise; and (ii)~\emph{MathSpatial-Corpus}, a training set of 8{,}000 problems equipped with verified solutions and structured reasoning traces. All problems are sourced from authentic educational materials and undergo multi-stage quality control including deduplication, geometric consistency checking, and cross-validated solution verification. Benchmarking 16 leading MLLMs on \emph{MathSpatial-Bench} reveals that spatial reasoning remains a fundamental bottleneck: even GPT-5 lags behind human performance by over 35 percentage points, with particularly poor results on abstract deduction tasks. We further show that training on \emph{MathSpatial-Corpus} yields consistent improvements across model families, demonstrating the dataset's practical value for advancing spatial reasoning capabilities. \emph{MathSpatial} is publicly available at https://shuolucs.github.io/MathSpatial.
format Preprint
id arxiv_https___arxiv_org_abs_2602_11635
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Do MLLMs Really Understand Space? A Mathematical Reasoning Evaluation
Lu, Shuo
Cheng, Jianjie
Xu, Yinuo
Yu, Yongcan
Sheng, Lijun
Wang, Peijie
Jiang, Siru
Hu, Yongguan
Ling, Run
Shao, Yihua
Ma, Ao
Feng, Wei
He, Lingxiao
Wang, Meng
Xie, Qianlong
Wang, Xingxing
Sebe, Nicu
He, Ran
Liang, Jian
Artificial Intelligence
Multimodal large language models (MLLMs) have achieved strong performance on perception-oriented tasks, yet their ability to perform mathematical spatial reasoning, defined as the capacity to parse and manipulate two- and three-dimensional relations, remains unclear. Humans easily solve textbook-style spatial reasoning problems with over 95\% accuracy, but we find that most leading MLLMs fail to reach even 60\% on the same tasks. This striking gap highlights spatial reasoning as a fundamental weakness of current models. To investigate this gap, we present \emph{MathSpatial}, the first large-scale and systematic dataset resource dedicated to mathematical spatial reasoning in MLLMs. \emph{MathSpatial} provides two complementary subsets: (i)~\emph{MathSpatial-Bench}, a rigorously curated evaluation set of 2{,}000 problems spanning 3 categories and 11 subtypes, designed to isolate spatial reasoning from perceptual noise; and (ii)~\emph{MathSpatial-Corpus}, a training set of 8{,}000 problems equipped with verified solutions and structured reasoning traces. All problems are sourced from authentic educational materials and undergo multi-stage quality control including deduplication, geometric consistency checking, and cross-validated solution verification. Benchmarking 16 leading MLLMs on \emph{MathSpatial-Bench} reveals that spatial reasoning remains a fundamental bottleneck: even GPT-5 lags behind human performance by over 35 percentage points, with particularly poor results on abstract deduction tasks. We further show that training on \emph{MathSpatial-Corpus} yields consistent improvements across model families, demonstrating the dataset's practical value for advancing spatial reasoning capabilities. \emph{MathSpatial} is publicly available at https://shuolucs.github.io/MathSpatial.
title Do MLLMs Really Understand Space? A Mathematical Reasoning Evaluation
topic Artificial Intelligence
url https://arxiv.org/abs/2602.11635