:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Guo, Yejie, Hou, Yunzhong, Ma, Wufei, Tang, Meng, Yang, Ming-Hsuan
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.16688
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models
by: Wang, Xingrui, et al.
Published: (2025)

SUNY: A Visual Interpretation Framework for Convolutional Neural Networks from a Necessary and Sufficient Perspective
by: Xuan, Xiwei, et al.
Published: (2023)

Extreme Amodal Face Detection
by: Song, Changlin, et al.
Published: (2025)

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
by: Wang, Xingrui, et al.
Published: (2024)

Learning Spatial-Semantic Features for Robust Video Object Segmentation
by: Li, Xin, et al.
Published: (2024)

Pursuing Better Decision Boundaries for Long-Tailed Object Detection via Category Information Amount
by: Ma, Yanbiao, et al.
Published: (2025)

SUMI-IFL: An Information-Theoretic Framework for Image Forgery Localization with Sufficiency and Minimality Constraints
by: Sheng, Ziqi, et al.
Published: (2024)

LAST: Leveraging Tools as Hints to Enhance Spatial Reasoning for Multimodal Large Language Models
by: Tian, Shi-Yu, et al.
Published: (2026)

Mamba-CAD: State Space Model For 3D Computer-Aided Design Generative Modeling
by: Li, Xueyang, et al.
Published: (2026)

M$^3$-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation
by: Chen, Zixuan, et al.
Published: (2024)

Attention in Space: Functional Roles of VLM Heads for Spatial Reasoning
by: Ma, Xueqi, et al.
Published: (2026)

Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek
by: Li, Xueyang, et al.
Published: (2025)

Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning
by: Guo, Xingang, et al.
Published: (2025)

Edit3r: Instant 3D Scene Editing from Sparse Unposed Images
by: Liu, Jiageng, et al.
Published: (2025)

MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs
by: Ge, Haonan, et al.
Published: (2025)

Enhancing Spatial Reasoning through Visual and Textual Thinking
by: Liang, Xun, et al.
Published: (2025)

ResAgent: Entropy-based Prior Point Discovery and Visual Reasoning for Referring Expression Segmentation
by: Wang, Yihao, et al.
Published: (2026)

An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models
by: Shiri, Fatemeh, et al.
Published: (2024)

Spatial Policy: Guiding Visuomotor Robotic Manipulation with Spatial-Aware Modeling and Reasoning
by: Liu, Yijun, et al.
Published: (2025)

ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Models
by: Mou, Tingshu, et al.
Published: (2026)

Minimal Sufficient Views: A DNN model making predictions with more evidence has higher accuracy
by: Kawano, Keisuke, et al.
Published: (2024)

Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models
by: Wang, Jiayu, et al.
Published: (2024)

Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
by: Zhan, Yufei, et al.
Published: (2025)

PAS: A Training-Free Stabilizer for Temporal Encoding in Video LLMs
by: Sun, Bowen, et al.
Published: (2025)

Geometrically-Constrained Agent for Spatial Reasoning
by: Chen, Zeren, et al.
Published: (2025)

Make Geometry Matter for Spatial Reasoning
by: Zhang, Shihua, et al.
Published: (2026)

NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving
by: Tian, Kexin, et al.
Published: (2025)

MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
by: Pan, Zhenyu, et al.
Published: (2025)

Sufficient, Necessary and Complete Causal Explanations in Image Classification
by: Kelly, David A, et al.
Published: (2025)

SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning
by: Ma, Wufei, et al.
Published: (2025)

KnowVal: A Knowledge-Augmented and Value-Guided Autonomous Driving System
by: Xia, Zhongyu, et al.
Published: (2025)

VISOR: VIsual Spatial Object Reasoning for Language-driven Object Navigation
by: Taioli, Francesco, et al.
Published: (2026)

Semantic-Aware Adaptive Visual Memory for Streaming Video Understanding
by: Wu, Hang, et al.
Published: (2026)

Chain-of-Look Spatial Reasoning for Dense Surgical Instrument Counting
by: Bhyri, Rishikesh, et al.
Published: (2026)

SkinGPT-X: A Self-Evolving Collaborative Multi-Agent System for Transparent and Trustworthy Dermatological Diagnosis
by: Chen, Zhangtianyi, et al.
Published: (2026)

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes
by: Liu, Tianhui, et al.
Published: (2026)

DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning
by: Wu, Hang, et al.
Published: (2025)

Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling
by: Jha, Saurav, et al.
Published: (2025)

CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning
by: Wu, Hang, et al.
Published: (2026)

Improved Visual-Spatial Reasoning via R1-Zero-Like Training
by: Liao, Zhenyi, et al.
Published: (2025)