:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Deng, Wei, Zhang, Xianlin, Qi, Mengshi
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2606.02459
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Explainable Action Form Assessment by Exploiting Multimodal Chain-of-Thoughts Reasoning
by: Qi, Mengshi, et al.
Published: (2025)

Chain-of-Evidence Multimodal Reasoning for Few-shot Temporal Action Localization
by: Qi, Mengshi, et al.
Published: (2025)

Towards Balanced Multi-Modal Learning in 3D Human Pose Estimation
by: Qi, Mengshi, et al.
Published: (2025)

Global-Local Tree Search in VLMs for 3D Indoor Scene Generation
by: Deng, Wei, et al.
Published: (2025)

Question-Aware Evidence Ledgers for Video Relational Reasoning
by: Ou, Yilin, et al.
Published: (2026)

Robust Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning
by: Qi, Mengshi, et al.
Published: (2025)

InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models
by: Deng, Nianchen, et al.
Published: (2025)

Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models
by: Zhou, Shengchao, et al.
Published: (2025)

Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
by: Wu, Junfei, et al.
Published: (2025)

Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement Learning
by: Deng, Huilin, et al.
Published: (2025)

T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving
by: Lv, Changsheng, et al.
Published: (2024)

Multi-Stage Contrastive Regression for Action Quality Assessment
by: An, Qi, et al.
Published: (2024)

OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
by: Jia, Mengdi, et al.
Published: (2025)

SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models
by: Cheng, An-Chieh, et al.
Published: (2024)

See, Remember, Explore: A Benchmark and Baselines for Streaming Spatial Reasoning
by: Wei, Yuxi, et al.
Published: (2026)

Spatial 3D-LLM: Exploring Spatial Awareness in 3D Vision-Language Models
by: Wang, Xiaoyan, et al.
Published: (2025)

Vision-Language Memory for Spatial Reasoning
by: Liu, Zuntao, et al.
Published: (2025)

ActFER: Agentic Facial Expression Recognition via Active Tool-Augmented Visual Reasoning
by: Liu, Shifeng, et al.
Published: (2026)

SURDS: Benchmarking Spatial Understanding and Reasoning in Driving Scenarios with Vision Language Models
by: Guo, Xianda, et al.
Published: (2024)

Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Spatial Reasoning
by: Tang, Yihong, et al.
Published: (2024)

SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery
by: Cao, Meng, et al.
Published: (2025)

Ascending the Infinite Ladder: Benchmarking Spatial Deformation Reasoning in Vision-Language Models
by: Zhang, Jiahuan, et al.
Published: (2025)

Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes
by: Gholami, Mohsen, et al.
Published: (2025)

Spatial-DISE: A Unified Benchmark for Evaluating Spatial Reasoning in Vision-Language Models
by: Huang, Xinmiao, et al.
Published: (2025)

Learning Group Interactions and Semantic Intentions for Multi-Object Trajectory Prediction
by: Qi, Mengshi, et al.
Published: (2024)

Decomposed Vector-Quantized Variational Autoencoder for Human Grasp Generation
by: Zhao, Zhe, et al.
Published: (2024)

Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes
by: Feng, Zhiyuan, et al.
Published: (2025)

HiSpatial: Taming Hierarchical 3D Spatial Understanding in Vision-Language Models
by: Liang, Huizhi, et al.
Published: (2026)

ViThinker: Active Vision-Language Reasoning via Dynamic Perceptual Querying
by: You, Weihang, et al.
Published: (2026)

Thinking with Blueprints: Assisting Vision-Language Models in Spatial Reasoning via Structured Object Representation
by: Ma, Weijian, et al.
Published: (2026)

Uncovering the human motion pattern: Pattern Memory-based Diffusion Model for Trajectory Prediction
by: Yang, Yuxin, et al.
Published: (2024)

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
by: Liu, Yang, et al.
Published: (2025)

Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models
by: Li, Ling, et al.
Published: (2025)

SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
by: Li, Hongxing, et al.
Published: (2025)

SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning
by: Chng, Yong Xien, et al.
Published: (2025)

Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models
by: Stogiannidis, Ilias, et al.
Published: (2025)

VIoTGPT: Learning to Schedule Vision Tools in LLMs towards Intelligent Video Internet of Things
by: Zhong, Yaoyao, et al.
Published: (2023)

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
by: Wang, Jiaqi, et al.
Published: (2025)

Enhancing MLLM Spatial Understanding via Active 3D Scene Exploration for Multi-Perspective Reasoning
by: Chen, Jiahua, et al.
Published: (2026)

Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models
by: Qi, Jianing, et al.
Published: (2025)