Saved in:
| Main Authors: | Guo, Yejie, Hou, Yunzhong, Ma, Wufei, Tang, Meng, Yang, Ming-Hsuan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.16688 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models
by: Wang, Xingrui, et al.
Published: (2025)
by: Wang, Xingrui, et al.
Published: (2025)
SUNY: A Visual Interpretation Framework for Convolutional Neural Networks from a Necessary and Sufficient Perspective
by: Xuan, Xiwei, et al.
Published: (2023)
by: Xuan, Xiwei, et al.
Published: (2023)
Extreme Amodal Face Detection
by: Song, Changlin, et al.
Published: (2025)
by: Song, Changlin, et al.
Published: (2025)
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
by: Wang, Xingrui, et al.
Published: (2024)
by: Wang, Xingrui, et al.
Published: (2024)
Learning Spatial-Semantic Features for Robust Video Object Segmentation
by: Li, Xin, et al.
Published: (2024)
by: Li, Xin, et al.
Published: (2024)
Pursuing Better Decision Boundaries for Long-Tailed Object Detection via Category Information Amount
by: Ma, Yanbiao, et al.
Published: (2025)
by: Ma, Yanbiao, et al.
Published: (2025)
SUMI-IFL: An Information-Theoretic Framework for Image Forgery Localization with Sufficiency and Minimality Constraints
by: Sheng, Ziqi, et al.
Published: (2024)
by: Sheng, Ziqi, et al.
Published: (2024)
LAST: Leveraging Tools as Hints to Enhance Spatial Reasoning for Multimodal Large Language Models
by: Tian, Shi-Yu, et al.
Published: (2026)
by: Tian, Shi-Yu, et al.
Published: (2026)
Mamba-CAD: State Space Model For 3D Computer-Aided Design Generative Modeling
by: Li, Xueyang, et al.
Published: (2026)
by: Li, Xueyang, et al.
Published: (2026)
M$^3$-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation
by: Chen, Zixuan, et al.
Published: (2024)
by: Chen, Zixuan, et al.
Published: (2024)
Attention in Space: Functional Roles of VLM Heads for Spatial Reasoning
by: Ma, Xueqi, et al.
Published: (2026)
by: Ma, Xueqi, et al.
Published: (2026)
Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek
by: Li, Xueyang, et al.
Published: (2025)
by: Li, Xueyang, et al.
Published: (2025)
Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning
by: Guo, Xingang, et al.
Published: (2025)
by: Guo, Xingang, et al.
Published: (2025)
Edit3r: Instant 3D Scene Editing from Sparse Unposed Images
by: Liu, Jiageng, et al.
Published: (2025)
by: Liu, Jiageng, et al.
Published: (2025)
MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs
by: Ge, Haonan, et al.
Published: (2025)
by: Ge, Haonan, et al.
Published: (2025)
Enhancing Spatial Reasoning through Visual and Textual Thinking
by: Liang, Xun, et al.
Published: (2025)
by: Liang, Xun, et al.
Published: (2025)
ResAgent: Entropy-based Prior Point Discovery and Visual Reasoning for Referring Expression Segmentation
by: Wang, Yihao, et al.
Published: (2026)
by: Wang, Yihao, et al.
Published: (2026)
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models
by: Shiri, Fatemeh, et al.
Published: (2024)
by: Shiri, Fatemeh, et al.
Published: (2024)
Spatial Policy: Guiding Visuomotor Robotic Manipulation with Spatial-Aware Modeling and Reasoning
by: Liu, Yijun, et al.
Published: (2025)
by: Liu, Yijun, et al.
Published: (2025)
ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Models
by: Mou, Tingshu, et al.
Published: (2026)
by: Mou, Tingshu, et al.
Published: (2026)
Minimal Sufficient Views: A DNN model making predictions with more evidence has higher accuracy
by: Kawano, Keisuke, et al.
Published: (2024)
by: Kawano, Keisuke, et al.
Published: (2024)
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models
by: Wang, Jiayu, et al.
Published: (2024)
by: Wang, Jiayu, et al.
Published: (2024)
Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
by: Zhan, Yufei, et al.
Published: (2025)
by: Zhan, Yufei, et al.
Published: (2025)
PAS: A Training-Free Stabilizer for Temporal Encoding in Video LLMs
by: Sun, Bowen, et al.
Published: (2025)
by: Sun, Bowen, et al.
Published: (2025)
Geometrically-Constrained Agent for Spatial Reasoning
by: Chen, Zeren, et al.
Published: (2025)
by: Chen, Zeren, et al.
Published: (2025)
Make Geometry Matter for Spatial Reasoning
by: Zhang, Shihua, et al.
Published: (2026)
by: Zhang, Shihua, et al.
Published: (2026)
NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving
by: Tian, Kexin, et al.
Published: (2025)
by: Tian, Kexin, et al.
Published: (2025)
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
by: Pan, Zhenyu, et al.
Published: (2025)
by: Pan, Zhenyu, et al.
Published: (2025)
Sufficient, Necessary and Complete Causal Explanations in Image Classification
by: Kelly, David A, et al.
Published: (2025)
by: Kelly, David A, et al.
Published: (2025)
SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning
by: Ma, Wufei, et al.
Published: (2025)
by: Ma, Wufei, et al.
Published: (2025)
KnowVal: A Knowledge-Augmented and Value-Guided Autonomous Driving System
by: Xia, Zhongyu, et al.
Published: (2025)
by: Xia, Zhongyu, et al.
Published: (2025)
VISOR: VIsual Spatial Object Reasoning for Language-driven Object Navigation
by: Taioli, Francesco, et al.
Published: (2026)
by: Taioli, Francesco, et al.
Published: (2026)
Semantic-Aware Adaptive Visual Memory for Streaming Video Understanding
by: Wu, Hang, et al.
Published: (2026)
by: Wu, Hang, et al.
Published: (2026)
Chain-of-Look Spatial Reasoning for Dense Surgical Instrument Counting
by: Bhyri, Rishikesh, et al.
Published: (2026)
by: Bhyri, Rishikesh, et al.
Published: (2026)
SkinGPT-X: A Self-Evolving Collaborative Multi-Agent System for Transparent and Trustworthy Dermatological Diagnosis
by: Chen, Zhangtianyi, et al.
Published: (2026)
by: Chen, Zhangtianyi, et al.
Published: (2026)
SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes
by: Liu, Tianhui, et al.
Published: (2026)
by: Liu, Tianhui, et al.
Published: (2026)
DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning
by: Wu, Hang, et al.
Published: (2025)
by: Wu, Hang, et al.
Published: (2025)
Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling
by: Jha, Saurav, et al.
Published: (2025)
by: Jha, Saurav, et al.
Published: (2025)
CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning
by: Wu, Hang, et al.
Published: (2026)
by: Wu, Hang, et al.
Published: (2026)
Improved Visual-Spatial Reasoning via R1-Zero-Like Training
by: Liao, Zhenyi, et al.
Published: (2025)
by: Liao, Zhenyi, et al.
Published: (2025)
Similar Items
-
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models
by: Wang, Xingrui, et al.
Published: (2025) -
SUNY: A Visual Interpretation Framework for Convolutional Neural Networks from a Necessary and Sufficient Perspective
by: Xuan, Xiwei, et al.
Published: (2023) -
Extreme Amodal Face Detection
by: Song, Changlin, et al.
Published: (2025) -
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
by: Wang, Xingrui, et al.
Published: (2024) -
Learning Spatial-Semantic Features for Robust Video Object Segmentation
by: Li, Xin, et al.
Published: (2024)