:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zheng, Hongpei, Li, Shijie, Li, Yanran, Yin, Hujun
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.03284
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Reg3D: Reconstructive Geometry Instruction Tuning for 3D Scene Understanding
by: Zheng, Hongpei, et al.
Published: (2025)

SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning
by: Ma, Wufei, et al.
Published: (2025)

Geometric Prior-Guided Neural Implicit Surface Reconstruction in the Wild
by: Xiang, Lintao, et al.
Published: (2025)

PointGS: Point Attention-Aware Sparse View Synthesis with Gaussian Splatting
by: Xiang, Lintao, et al.
Published: (2025)

IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction
by: Qian, Lin, et al.
Published: (2026)

NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding
by: Zhai, Hongjia, et al.
Published: (2024)

Enhancing MLLM Spatial Understanding via Active 3D Scene Exploration for Multi-Perspective Reasoning
by: Chen, Jiahua, et al.
Published: (2026)

SURPRISE3D: A Dataset for Spatial Understanding and Reasoning in Complex 3D Scenes
by: Huang, Jiaxin, et al.
Published: (2025)

3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation
by: Zhang, Frank, et al.
Published: (2024)

Grounding by Remembering: Cross-Scene and In-Scene Memory for 3D Functional Affordances
by: Wang, Qirui, et al.
Published: (2026)

SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning
by: Zhang, Jian, et al.
Published: (2026)

HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation
by: Dong, Wenqi, et al.
Published: (2025)

SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion
by: Guo, Xiyue, et al.
Published: (2025)

GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding
by: Li, Hao, et al.
Published: (2023)

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes
by: Liu, Tianhui, et al.
Published: (2026)

CLEAR-IR: Clarity-Enhanced Active Reconstruction of Infrared Imagery
by: Shankar, Nathan, et al.
Published: (2025)

InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes
by: Yang, Zesong, et al.
Published: (2025)

Intelligent Spatial Perception by Building Hierarchical 3D Scene Graphs for Indoor Scenarios with the Help of LLMs
by: Cheng, Yao, et al.
Published: (2025)

COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence
by: Zhang, Zefeng, et al.
Published: (2025)

Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding
by: Jeon, Yerim, et al.
Published: (2025)

HMR3D: Hierarchical Multimodal Representation for 3D Scene Understanding with Large Vision-Language Model
by: Li, Chen, et al.
Published: (2025)

Scene-R1: Video-Grounded Large Language Models for 3D Scene Reasoning without 3D Annotations
by: Yuan, Zhihao, et al.
Published: (2025)

A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning
by: Jiang, Siyang, et al.
Published: (2025)

Generating Human Motion in 3D Scenes from Text Descriptions
by: Cen, Zhi, et al.
Published: (2024)

Leveraging 2D-VLM for Label-Free 3D Segmentation in Large-Scale Outdoor Scene Understanding
by: Nishimura, Toshihiko, et al.
Published: (2026)

SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations
by: Zhang, Songchun, et al.
Published: (2025)

PathoHR: Hierarchical Reasoning for Vision-Language Models in Pathology
by: Huang, Yating, et al.
Published: (2025)

S3R-GS: Streamlining the Pipeline for Large-Scale Street Scene Reconstruction
by: Zheng, Guangting, et al.
Published: (2025)

Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding
by: Wang, Ziyang, et al.
Published: (2025)

Boosting MLLM Spatial Reasoning with Geometrically Referenced 3D Scene Representations
by: Yuan, Jiangye, et al.
Published: (2026)

SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving
by: Li, Yiming, et al.
Published: (2023)

3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
by: Huang, Ting, et al.
Published: (2025)

Open-Vocabulary Octree-Graph for 3D Scene Understanding
by: Wang, Zhigang, et al.
Published: (2024)

SAM-Guided Masked Token Prediction for 3D Scene Understanding
by: Chen, Zhimin, et al.
Published: (2024)

DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
by: Luo, Jingzhou, et al.
Published: (2025)

Spatial As Deep: Spatial CNN for Traffic Scene Understanding
by: Pan, Xingang, et al.
Published: (2017)

IL3D: A Large-Scale Indoor Layout Dataset for LLM-Driven 3D Scene Generation
by: Zhou, Wenxu, et al.
Published: (2025)

SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery
by: Cao, Meng, et al.
Published: (2025)

R2G: Reasoning to Ground in 3D Scenes
by: Li, Yixuan, et al.
Published: (2024)

Think3D: Thinking with Space for Spatial Reasoning
by: Zhang, Zaibin, et al.
Published: (2026)