Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Pan, Liu, Yang, Wu, Guile, Corral-Soto, Eduardo R., Huang, Chengjie, Xu, Binbin, Bai, Dongfeng, Yan, Xu, Ren, Yuan, Chen, Xingxin, Wu, Yizhe, Huang, Tao, Wan, Wenjun, Wu, Xin, Zhou, Pei, Dai, Xuyang, Lv, Kangbo, Zhang, Hongbo, Fried, Yosef, Ye, Aixue, Feng, Bailan, Chen, Zhenyu, Li, Zhen, Chen, Yingcong, Liao, Yiyi, Liu, Bingbing
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.00092
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911491841589248
author	Wang, Pan Liu, Yang Wu, Guile Corral-Soto, Eduardo R. Huang, Chengjie Xu, Binbin Bai, Dongfeng Yan, Xu Ren, Yuan Chen, Xingxin Wu, Yizhe Huang, Tao Wan, Wenjun Wu, Xin Zhou, Pei Dai, Xuyang Lv, Kangbo Zhang, Hongbo Fried, Yosef Ye, Aixue Feng, Bailan Chen, Zhenyu Li, Zhen Chen, Yingcong Liao, Yiyi Liu, Bingbing
author_facet	Wang, Pan Liu, Yang Wu, Guile Corral-Soto, Eduardo R. Huang, Chengjie Xu, Binbin Bai, Dongfeng Yan, Xu Ren, Yuan Chen, Xingxin Wu, Yizhe Huang, Tao Wan, Wenjun Wu, Xin Zhou, Pei Dai, Xuyang Lv, Kangbo Zhang, Hongbo Fried, Yosef Ye, Aixue Feng, Bailan Chen, Zhenyu Li, Zhen Chen, Yingcong Liao, Yiyi Liu, Bingbing
contents	4D spatial intelligence involves perceiving and processing how objects move or change over time. Humans naturally possess 4D spatial intelligence, supporting a broad spectrum of spatial reasoning abilities. To what extent can Multimodal Large Language Models (MLLMs) achieve human-level 4D spatial intelligence? In this work, we present Spatial4D-Bench, a versatile 4D spatial intelligence benchmark designed to comprehensively assess the 4D spatial reasoning abilities of MLLMs. Unlike existing spatial intelligence benchmarks that are often small-scale or limited in diversity, Spatial4D-Bench provides a large-scale, multi-task evaluation benchmark consisting of ~40,000 question-answer pairs covering 18 well-defined tasks. We systematically organize these tasks into six cognitive categories: object understanding, scene understanding, spatial relationship understanding, spatiotemporal relationship understanding, spatial reasoning and spatiotemporal reasoning. Spatial4D-Bench thereby offers a structured and comprehensive benchmark for evaluating the spatial cognition abilities of MLLMs, covering a broad spectrum of tasks that parallel the versatility of human spatial intelligence. We benchmark various state-of-the-art open-source and proprietary MLLMs on Spatial4D-Bench and reveal their substantial limitations in a wide variety of 4D spatial reasoning aspects, such as route plan, action recognition, and physical plausibility reasoning. We hope that the findings provided in this work offer valuable insights to the community and that our benchmark can facilitate the development of more capable MLLMs toward human-level 4D spatial intelligence. More resources can be found on our project page.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_00092
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Spatial4D-Bench: A Versatile 4D Spatial Intelligence Benchmark Wang, Pan Liu, Yang Wu, Guile Corral-Soto, Eduardo R. Huang, Chengjie Xu, Binbin Bai, Dongfeng Yan, Xu Ren, Yuan Chen, Xingxin Wu, Yizhe Huang, Tao Wan, Wenjun Wu, Xin Zhou, Pei Dai, Xuyang Lv, Kangbo Zhang, Hongbo Fried, Yosef Ye, Aixue Feng, Bailan Chen, Zhenyu Li, Zhen Chen, Yingcong Liao, Yiyi Liu, Bingbing Computer Vision and Pattern Recognition 4D spatial intelligence involves perceiving and processing how objects move or change over time. Humans naturally possess 4D spatial intelligence, supporting a broad spectrum of spatial reasoning abilities. To what extent can Multimodal Large Language Models (MLLMs) achieve human-level 4D spatial intelligence? In this work, we present Spatial4D-Bench, a versatile 4D spatial intelligence benchmark designed to comprehensively assess the 4D spatial reasoning abilities of MLLMs. Unlike existing spatial intelligence benchmarks that are often small-scale or limited in diversity, Spatial4D-Bench provides a large-scale, multi-task evaluation benchmark consisting of ~40,000 question-answer pairs covering 18 well-defined tasks. We systematically organize these tasks into six cognitive categories: object understanding, scene understanding, spatial relationship understanding, spatiotemporal relationship understanding, spatial reasoning and spatiotemporal reasoning. Spatial4D-Bench thereby offers a structured and comprehensive benchmark for evaluating the spatial cognition abilities of MLLMs, covering a broad spectrum of tasks that parallel the versatility of human spatial intelligence. We benchmark various state-of-the-art open-source and proprietary MLLMs on Spatial4D-Bench and reveal their substantial limitations in a wide variety of 4D spatial reasoning aspects, such as route plan, action recognition, and physical plausibility reasoning. We hope that the findings provided in this work offer valuable insights to the community and that our benchmark can facilitate the development of more capable MLLMs toward human-level 4D spatial intelligence. More resources can be found on our project page.
title	Spatial4D-Bench: A Versatile 4D Spatial Intelligence Benchmark
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2601.00092

Similar Items