Saved in:
| Main Authors: | Deng, Wei, Zhang, Xianlin, Qi, Mengshi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2606.02459 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Explainable Action Form Assessment by Exploiting Multimodal Chain-of-Thoughts Reasoning
by: Qi, Mengshi, et al.
Published: (2025)
by: Qi, Mengshi, et al.
Published: (2025)
Chain-of-Evidence Multimodal Reasoning for Few-shot Temporal Action Localization
by: Qi, Mengshi, et al.
Published: (2025)
by: Qi, Mengshi, et al.
Published: (2025)
Towards Balanced Multi-Modal Learning in 3D Human Pose Estimation
by: Qi, Mengshi, et al.
Published: (2025)
by: Qi, Mengshi, et al.
Published: (2025)
Global-Local Tree Search in VLMs for 3D Indoor Scene Generation
by: Deng, Wei, et al.
Published: (2025)
by: Deng, Wei, et al.
Published: (2025)
Question-Aware Evidence Ledgers for Video Relational Reasoning
by: Ou, Yilin, et al.
Published: (2026)
by: Ou, Yilin, et al.
Published: (2026)
Robust Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning
by: Qi, Mengshi, et al.
Published: (2025)
by: Qi, Mengshi, et al.
Published: (2025)
InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models
by: Deng, Nianchen, et al.
Published: (2025)
by: Deng, Nianchen, et al.
Published: (2025)
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models
by: Zhou, Shengchao, et al.
Published: (2025)
by: Zhou, Shengchao, et al.
Published: (2025)
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
by: Wu, Junfei, et al.
Published: (2025)
by: Wu, Junfei, et al.
Published: (2025)
Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement Learning
by: Deng, Huilin, et al.
Published: (2025)
by: Deng, Huilin, et al.
Published: (2025)
T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving
by: Lv, Changsheng, et al.
Published: (2024)
by: Lv, Changsheng, et al.
Published: (2024)
Multi-Stage Contrastive Regression for Action Quality Assessment
by: An, Qi, et al.
Published: (2024)
by: An, Qi, et al.
Published: (2024)
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
by: Jia, Mengdi, et al.
Published: (2025)
by: Jia, Mengdi, et al.
Published: (2025)
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models
by: Cheng, An-Chieh, et al.
Published: (2024)
by: Cheng, An-Chieh, et al.
Published: (2024)
See, Remember, Explore: A Benchmark and Baselines for Streaming Spatial Reasoning
by: Wei, Yuxi, et al.
Published: (2026)
by: Wei, Yuxi, et al.
Published: (2026)
Spatial 3D-LLM: Exploring Spatial Awareness in 3D Vision-Language Models
by: Wang, Xiaoyan, et al.
Published: (2025)
by: Wang, Xiaoyan, et al.
Published: (2025)
Vision-Language Memory for Spatial Reasoning
by: Liu, Zuntao, et al.
Published: (2025)
by: Liu, Zuntao, et al.
Published: (2025)
ActFER: Agentic Facial Expression Recognition via Active Tool-Augmented Visual Reasoning
by: Liu, Shifeng, et al.
Published: (2026)
by: Liu, Shifeng, et al.
Published: (2026)
SURDS: Benchmarking Spatial Understanding and Reasoning in Driving Scenarios with Vision Language Models
by: Guo, Xianda, et al.
Published: (2024)
by: Guo, Xianda, et al.
Published: (2024)
Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Spatial Reasoning
by: Tang, Yihong, et al.
Published: (2024)
by: Tang, Yihong, et al.
Published: (2024)
SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery
by: Cao, Meng, et al.
Published: (2025)
by: Cao, Meng, et al.
Published: (2025)
Ascending the Infinite Ladder: Benchmarking Spatial Deformation Reasoning in Vision-Language Models
by: Zhang, Jiahuan, et al.
Published: (2025)
by: Zhang, Jiahuan, et al.
Published: (2025)
Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes
by: Gholami, Mohsen, et al.
Published: (2025)
by: Gholami, Mohsen, et al.
Published: (2025)
Spatial-DISE: A Unified Benchmark for Evaluating Spatial Reasoning in Vision-Language Models
by: Huang, Xinmiao, et al.
Published: (2025)
by: Huang, Xinmiao, et al.
Published: (2025)
Learning Group Interactions and Semantic Intentions for Multi-Object Trajectory Prediction
by: Qi, Mengshi, et al.
Published: (2024)
by: Qi, Mengshi, et al.
Published: (2024)
Decomposed Vector-Quantized Variational Autoencoder for Human Grasp Generation
by: Zhao, Zhe, et al.
Published: (2024)
by: Zhao, Zhe, et al.
Published: (2024)
Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes
by: Feng, Zhiyuan, et al.
Published: (2025)
by: Feng, Zhiyuan, et al.
Published: (2025)
HiSpatial: Taming Hierarchical 3D Spatial Understanding in Vision-Language Models
by: Liang, Huizhi, et al.
Published: (2026)
by: Liang, Huizhi, et al.
Published: (2026)
ViThinker: Active Vision-Language Reasoning via Dynamic Perceptual Querying
by: You, Weihang, et al.
Published: (2026)
by: You, Weihang, et al.
Published: (2026)
Thinking with Blueprints: Assisting Vision-Language Models in Spatial Reasoning via Structured Object Representation
by: Ma, Weijian, et al.
Published: (2026)
by: Ma, Weijian, et al.
Published: (2026)
Uncovering the human motion pattern: Pattern Memory-based Diffusion Model for Trajectory Prediction
by: Yang, Yuxin, et al.
Published: (2024)
by: Yang, Yuxin, et al.
Published: (2024)
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
by: Liu, Yang, et al.
Published: (2025)
by: Liu, Yang, et al.
Published: (2025)
Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models
by: Li, Ling, et al.
Published: (2025)
by: Li, Ling, et al.
Published: (2025)
SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
by: Li, Hongxing, et al.
Published: (2025)
by: Li, Hongxing, et al.
Published: (2025)
SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning
by: Chng, Yong Xien, et al.
Published: (2025)
by: Chng, Yong Xien, et al.
Published: (2025)
Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models
by: Stogiannidis, Ilias, et al.
Published: (2025)
by: Stogiannidis, Ilias, et al.
Published: (2025)
VIoTGPT: Learning to Schedule Vision Tools in LLMs towards Intelligent Video Internet of Things
by: Zhong, Yaoyao, et al.
Published: (2023)
by: Zhong, Yaoyao, et al.
Published: (2023)
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
by: Wang, Jiaqi, et al.
Published: (2025)
by: Wang, Jiaqi, et al.
Published: (2025)
Enhancing MLLM Spatial Understanding via Active 3D Scene Exploration for Multi-Perspective Reasoning
by: Chen, Jiahua, et al.
Published: (2026)
by: Chen, Jiahua, et al.
Published: (2026)
Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models
by: Qi, Jianing, et al.
Published: (2025)
by: Qi, Jianing, et al.
Published: (2025)
Similar Items
-
Explainable Action Form Assessment by Exploiting Multimodal Chain-of-Thoughts Reasoning
by: Qi, Mengshi, et al.
Published: (2025) -
Chain-of-Evidence Multimodal Reasoning for Few-shot Temporal Action Localization
by: Qi, Mengshi, et al.
Published: (2025) -
Towards Balanced Multi-Modal Learning in 3D Human Pose Estimation
by: Qi, Mengshi, et al.
Published: (2025) -
Global-Local Tree Search in VLMs for 3D Indoor Scene Generation
by: Deng, Wei, et al.
Published: (2025) -
Question-Aware Evidence Ledgers for Video Relational Reasoning
by: Ou, Yilin, et al.
Published: (2026)