Saved in:
| Main Authors: | Zhang, Yiming, Chen, Jiacheng, Tan, Jiaqi, Mao, Yongsen, Chen, Wenhu, Chang, Angel X. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.24300 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes
by: Liu, Tianhui, et al.
Published: (2026)
by: Liu, Tianhui, et al.
Published: (2026)
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
by: Wang, Yuxin, et al.
Published: (2025)
by: Wang, Yuxin, et al.
Published: (2025)
Doctor: Optimizing Container Rebuild Efficiency by Instruction Re-Orchestration
by: Zhu, Zhiling, et al.
Published: (2025)
by: Zhu, Zhiling, et al.
Published: (2025)
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
by: Lee, Han-Hung, et al.
Published: (2024)
by: Lee, Han-Hung, et al.
Published: (2024)
SeqVLM: Proposal-Guided Multi-View Sequences Reasoning via VLM for Zero-Shot 3D Visual Grounding
by: Lin, Jiawen, et al.
Published: (2025)
by: Lin, Jiawen, et al.
Published: (2025)
ReMedi: Reasoner for Medical Clinical Prediction
by: Cao, Yushi, et al.
Published: (2026)
by: Cao, Yushi, et al.
Published: (2026)
SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning
by: Zhang, Jian, et al.
Published: (2026)
by: Zhang, Jian, et al.
Published: (2026)
VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding
by: He, Jianxiang, et al.
Published: (2025)
by: He, Jianxiang, et al.
Published: (2025)
Rebuilding Public Confidence in Educational Assessment
by: Richardson, Mary
Published: (2022)
by: Richardson, Mary
Published: (2022)
VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
by: Xu, Runsen, et al.
Published: (2024)
by: Xu, Runsen, et al.
Published: (2024)
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
by: Chen, Boyuan, et al.
Published: (2024)
by: Chen, Boyuan, et al.
Published: (2024)
ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding
by: Wang, Austin T., et al.
Published: (2025)
by: Wang, Austin T., et al.
Published: (2025)
SpatialLM: Training Large Language Models for Structured Indoor Modeling
by: Mao, Yongsen, et al.
Published: (2025)
by: Mao, Yongsen, et al.
Published: (2025)
S2O: Static to Openable Enhancement for Articulated 3D Objects
by: Iliash, Denys, et al.
Published: (2024)
by: Iliash, Denys, et al.
Published: (2024)
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
by: Jiang, Ziyan, et al.
Published: (2024)
by: Jiang, Ziyan, et al.
Published: (2024)
CREG: Compass Relational Evidence Graph for Characterizing Directional Structure in VLM Spatial-Reasoning Attribution
by: Tan, Kaizhen, et al.
Published: (2026)
by: Tan, Kaizhen, et al.
Published: (2026)
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents
by: Meng, Rui, et al.
Published: (2025)
by: Meng, Rui, et al.
Published: (2025)
How Do Document Parsers Break? Auditing Structural Vulnerability in Document Intelligence
by: Chen, Yue, et al.
Published: (2026)
by: Chen, Yue, et al.
Published: (2026)
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
by: Wang, Haozhe, et al.
Published: (2026)
by: Wang, Haozhe, et al.
Published: (2026)
Rebuilding Syria
Published: (2019)
Published: (2019)
SPATIALGEN: Layout-guided 3D Indoor Scene Generation
by: Fang, Chuan, et al.
Published: (2025)
by: Fang, Chuan, et al.
Published: (2025)
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
by: Jia, Yiming, et al.
Published: (2025)
by: Jia, Yiming, et al.
Published: (2025)
CoRe3D: Collaborative Reasoning as a Foundation for 3D Intelligence
by: Yu, Tianjiao, et al.
Published: (2025)
by: Yu, Tianjiao, et al.
Published: (2025)
MLLM-4D: Towards Visual-based Spatial-Temporal Intelligence
by: Yin, Xingyilang, et al.
Published: (2026)
by: Yin, Xingyilang, et al.
Published: (2026)
VISOR: VIsual Spatial Object Reasoning for Language-driven Object Navigation
by: Taioli, Francesco, et al.
Published: (2026)
by: Taioli, Francesco, et al.
Published: (2026)
Do 3D Large Language Models Really Understand 3D Spatial Relationships?
by: Ma, Xianzheng, et al.
Published: (2026)
by: Ma, Xianzheng, et al.
Published: (2026)
Is your VLM Sky-Ready? A Comprehensive Spatial Intelligence Benchmark for UAV Navigation
by: Zhang, Lingfeng, et al.
Published: (2025)
by: Zhang, Lingfeng, et al.
Published: (2025)
UV-processing of icy pebbles in the outer parts of VSI-turbulent disks
by: Flores-Rivera, Lizxandra, et al.
Published: (2024)
by: Flores-Rivera, Lizxandra, et al.
Published: (2024)
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation
by: Ma, Wentao, et al.
Published: (2025)
by: Ma, Wentao, et al.
Published: (2025)
ReVul-CoT: Towards Effective Software Vulnerability Assessment with Retrieval-Augmented Generation and Chain-of-Thought Prompting
by: Chen, Zhijie, et al.
Published: (2025)
by: Chen, Zhijie, et al.
Published: (2025)
ODMixer: Fine-grained Spatial-temporal MLP for Metro Origin-Destination Prediction
by: Liu, Yang, et al.
Published: (2024)
by: Liu, Yang, et al.
Published: (2024)
Self-Rebuilding Artificial Mimetic Super-Intelligence: Proof of Ubiquitous Regeneration
by: Tabary, Frédéric
Published: (2025)
by: Tabary, Frédéric
Published: (2025)
DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models
by: Wang, Bowen, et al.
Published: (2024)
by: Wang, Bowen, et al.
Published: (2024)
Attention in Space: Functional Roles of VLM Heads for Spatial Reasoning
by: Ma, Xueqi, et al.
Published: (2026)
by: Ma, Xueqi, et al.
Published: (2026)
G$^2$VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
by: Hu, Wenbo, et al.
Published: (2025)
by: Hu, Wenbo, et al.
Published: (2025)
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
by: Wang, Haozhe, et al.
Published: (2025)
by: Wang, Haozhe, et al.
Published: (2025)
Rebuilding broken hearts
Published: (2004)
Published: (2004)
Rebuilding the food pyramid
Published: (2003)
Published: (2003)
SpatialImaginer: Towards Adaptive Visual Imagination for Spatial Reasoning
by: Li, Yian, et al.
Published: (2026)
by: Li, Yian, et al.
Published: (2026)
DeconDTN-Toolkit: A Library for Evaluation and Enhancement of Robustness to Provenance Shift
by: Tan, Yongsen, et al.
Published: (2026)
by: Tan, Yongsen, et al.
Published: (2026)
Similar Items
-
SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes
by: Liu, Tianhui, et al.
Published: (2026) -
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
by: Wang, Yuxin, et al.
Published: (2025) -
Doctor: Optimizing Container Rebuild Efficiency by Instruction Re-Orchestration
by: Zhu, Zhiling, et al.
Published: (2025) -
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
by: Lee, Han-Hung, et al.
Published: (2024) -
SeqVLM: Proposal-Guided Multi-View Sequences Reasoning via VLM for Zero-Shot 3D Visual Grounding
by: Lin, Jiawen, et al.
Published: (2025)