:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Pan, Xingang, Zhan, Xiaohang, Shi, Jianping, Luo, Ping, Wang, Xiaogang, Tang, Xiaoou
Format:	Preprint
Published:	2017
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/1712.06080
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MTPano: Multi-Task Panoramic Scene Understanding via Label-Free Integration of Dense Prediction Priors
by: Zhang, Jingdong, et al.
Published: (2026)

ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
by: Chen, Yongwei, et al.
Published: (2024)

Enhancing MLLM Spatial Understanding via Active 3D Scene Exploration for Multi-Perspective Reasoning
by: Chen, Jiahua, et al.
Published: (2026)

Deep Skin Lesion Segmentation with Transformer-CNN Fusion: Toward Intelligent Skin Cancer Analysis
by: Wang, Xin, et al.
Published: (2025)

WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding
by: Kong, Quan, et al.
Published: (2024)

MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior
by: Chen, Honghua, et al.
Published: (2024)

Advances in Deep Concealed Scene Understanding
by: Fan, Deng-Ping, et al.
Published: (2023)

RieMind: Geometry-Grounded Spatial Agent for Scene Understanding
by: Ropero, Fernando, et al.
Published: (2026)

STaR: Seamless Spatial-Temporal Aware Motion Retargeting with Penetration and Consistency Constraints
by: Yang, Xiaohang, et al.
Published: (2025)

LaRender: Training-Free Occlusion Control in Image Generation via Latent Rendering
by: Zhan, Xiaohang, et al.
Published: (2025)

TraceVision: Trajectory-Aware Vision-Language Model for Human-Like Spatial Understanding
by: Yang, Fan, et al.
Published: (2026)

SpatialReasoner: Active Perception for Large-Scale 3D Scene Understanding
by: Zheng, Hongpei, et al.
Published: (2025)

Spatial Preference Rewarding for MLLMs Spatial Understanding
by: Qiu, Han, et al.
Published: (2025)

STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction
by: Liao, Zhimin, et al.
Published: (2025)

From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space
by: Bu, Jiazi, et al.
Published: (2026)

Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
by: Xu, Runsen, et al.
Published: (2025)

SpatialNav: Leveraging Spatial Scene Graphs for Zero-Shot Vision-and-Language Navigation
by: Zhang, Jiwen, et al.
Published: (2026)

Agentic 3D Scene Generation with Spatially Contextualized VLMs
by: Liu, Xinhang, et al.
Published: (2025)

ColorMNet: A Memory-based Deep Spatial-Temporal Feature Propagation Network for Video Colorization
by: Yang, Yixin, et al.
Published: (2024)

FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing
by: Wei, Tianyi, et al.
Published: (2025)

Scene Summarization: Clustering Scene Videos into Spatially Diverse Frames
by: Chen, Chao, et al.
Published: (2023)

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes
by: Liu, Tianhui, et al.
Published: (2026)

NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving
by: Tian, Kexin, et al.
Published: (2025)

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
by: Chen, Yongwei, et al.
Published: (2024)

SpatialBot: Precise Spatial Understanding with Vision Language Models
by: Cai, Wenxiao, et al.
Published: (2024)

Enhancing Spatial Understanding in Image Generation via Reward Modeling
by: Tang, Zhenyu, et al.
Published: (2026)

Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding?
by: Feng, Bo, et al.
Published: (2025)

SURPRISE3D: A Dataset for Spatial Understanding and Reasoning in Complex 3D Scenes
by: Huang, Jiaxin, et al.
Published: (2025)

FreeQ-Graph: Free-form Querying with Semantic Consistent Scene Graph for 3D Scene Understanding
by: Zhan, Chenlu, et al.
Published: (2025)

Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Spatial Reasoning
by: Tang, Yihong, et al.
Published: (2024)

Spectral-Spatial Self-Supervised Learning for Few-Shot Hyperspectral Image Classification
by: Chen, Wenchen, et al.
Published: (2025)

Low-Light Video Enhancement via Spatial-Temporal Consistent Decomposition
by: Xu, Xiaogang, et al.
Published: (2024)

Low-Light Video Enhancement with An Effective Spatial-Temporal Decomposition Paradigm
by: Xu, Xiaogang, et al.
Published: (2026)

Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding
by: Mirjalili, Vahid, et al.
Published: (2025)

pySpatial: Generating 3D Visual Programs for Zero-Shot Spatial Reasoning
by: Luo, Zhanpeng, et al.
Published: (2026)

Scenario Understanding of Traffic Scenes Through Large Visual Language Models
by: Rivera, Esteban, et al.
Published: (2025)

SSR: Pushing the Limit of Spatial Intelligence with Structured Scene Reasoning
by: Zhang, Yi, et al.
Published: (2026)

Semantic Foam: Unifying Spatial and Semantic Scene Decomposition
by: Sharafeldin, Amr, et al.
Published: (2026)

VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding
by: Shi, Jiapeng, et al.
Published: (2026)

Spatial Chain-of-Thought: Bridging Understanding and Generation Models for Spatial Reasoning Generation
by: Chen, Wei, et al.
Published: (2026)