:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Yiming, Chen, Jiacheng, Tan, Jiaqi, Mao, Yongsen, Chen, Wenhu, Chang, Angel X.
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.24300
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes
by: Liu, Tianhui, et al.
Published: (2026)

N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
by: Wang, Yuxin, et al.
Published: (2025)

Doctor: Optimizing Container Rebuild Efficiency by Instruction Re-Orchestration
by: Zhu, Zhiling, et al.
Published: (2025)

Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
by: Lee, Han-Hung, et al.
Published: (2024)

SeqVLM: Proposal-Guided Multi-View Sequences Reasoning via VLM for Zero-Shot 3D Visual Grounding
by: Lin, Jiawen, et al.
Published: (2025)

ReMedi: Reasoner for Medical Clinical Prediction
by: Cao, Yushi, et al.
Published: (2026)

SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning
by: Zhang, Jian, et al.
Published: (2026)

VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding
by: He, Jianxiang, et al.
Published: (2025)

Rebuilding Public Confidence in Educational Assessment
by: Richardson, Mary
Published: (2022)

VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
by: Xu, Runsen, et al.
Published: (2024)

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
by: Chen, Boyuan, et al.
Published: (2024)

ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding
by: Wang, Austin T., et al.
Published: (2025)

SpatialLM: Training Large Language Models for Structured Indoor Modeling
by: Mao, Yongsen, et al.
Published: (2025)

S2O: Static to Openable Enhancement for Articulated 3D Objects
by: Iliash, Denys, et al.
Published: (2024)

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
by: Jiang, Ziyan, et al.
Published: (2024)

CREG: Compass Relational Evidence Graph for Characterizing Directional Structure in VLM Spatial-Reasoning Attribution
by: Tan, Kaizhen, et al.
Published: (2026)

VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents
by: Meng, Rui, et al.
Published: (2025)

How Do Document Parsers Break? Auditing Structural Vulnerability in Document Intelligence
by: Chen, Yue, et al.
Published: (2026)

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
by: Wang, Haozhe, et al.
Published: (2026)

Rebuilding Syria
Published: (2019)

SPATIALGEN: Layout-guided 3D Indoor Scene Generation
by: Fang, Chuan, et al.
Published: (2025)

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
by: Jia, Yiming, et al.
Published: (2025)

CoRe3D: Collaborative Reasoning as a Foundation for 3D Intelligence
by: Yu, Tianjiao, et al.
Published: (2025)

MLLM-4D: Towards Visual-based Spatial-Temporal Intelligence
by: Yin, Xingyilang, et al.
Published: (2026)

VISOR: VIsual Spatial Object Reasoning for Language-driven Object Navigation
by: Taioli, Francesco, et al.
Published: (2026)

Do 3D Large Language Models Really Understand 3D Spatial Relationships?
by: Ma, Xianzheng, et al.
Published: (2026)

Is your VLM Sky-Ready? A Comprehensive Spatial Intelligence Benchmark for UAV Navigation
by: Zhang, Lingfeng, et al.
Published: (2025)

UV-processing of icy pebbles in the outer parts of VSI-turbulent disks
by: Flores-Rivera, Lizxandra, et al.
Published: (2024)

VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation
by: Ma, Wentao, et al.
Published: (2025)

ReVul-CoT: Towards Effective Software Vulnerability Assessment with Retrieval-Augmented Generation and Chain-of-Thought Prompting
by: Chen, Zhijie, et al.
Published: (2025)

ODMixer: Fine-grained Spatial-temporal MLP for Metro Origin-Destination Prediction
by: Liu, Yang, et al.
Published: (2024)

Self-Rebuilding Artificial Mimetic Super-Intelligence: Proof of Ubiquitous Regeneration
by: Tabary, Frédéric
Published: (2025)

DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models
by: Wang, Bowen, et al.
Published: (2024)

Attention in Space: Functional Roles of VLM Heads for Spatial Reasoning
by: Ma, Xueqi, et al.
Published: (2026)

G$^2$VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
by: Hu, Wenbo, et al.
Published: (2025)

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
by: Wang, Haozhe, et al.
Published: (2025)

Rebuilding broken hearts
Published: (2004)

Rebuilding the food pyramid
Published: (2003)

SpatialImaginer: Towards Adaptive Visual Imagination for Spatial Reasoning
by: Li, Yian, et al.
Published: (2026)

DeconDTN-Toolkit: A Library for Evaluation and Enhancement of Robustness to Provenance Shift
by: Tan, Yongsen, et al.
Published: (2026)