Saved in:
| Main Authors: | Pan, Xingang, Zhan, Xiaohang, Shi, Jianping, Luo, Ping, Wang, Xiaogang, Tang, Xiaoou |
|---|---|
| Format: | Preprint |
| Published: |
2017
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/1712.06080 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MTPano: Multi-Task Panoramic Scene Understanding via Label-Free Integration of Dense Prediction Priors
by: Zhang, Jingdong, et al.
Published: (2026)
by: Zhang, Jingdong, et al.
Published: (2026)
ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
by: Chen, Yongwei, et al.
Published: (2024)
by: Chen, Yongwei, et al.
Published: (2024)
Enhancing MLLM Spatial Understanding via Active 3D Scene Exploration for Multi-Perspective Reasoning
by: Chen, Jiahua, et al.
Published: (2026)
by: Chen, Jiahua, et al.
Published: (2026)
Deep Skin Lesion Segmentation with Transformer-CNN Fusion: Toward Intelligent Skin Cancer Analysis
by: Wang, Xin, et al.
Published: (2025)
by: Wang, Xin, et al.
Published: (2025)
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding
by: Kong, Quan, et al.
Published: (2024)
by: Kong, Quan, et al.
Published: (2024)
MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior
by: Chen, Honghua, et al.
Published: (2024)
by: Chen, Honghua, et al.
Published: (2024)
Advances in Deep Concealed Scene Understanding
by: Fan, Deng-Ping, et al.
Published: (2023)
by: Fan, Deng-Ping, et al.
Published: (2023)
RieMind: Geometry-Grounded Spatial Agent for Scene Understanding
by: Ropero, Fernando, et al.
Published: (2026)
by: Ropero, Fernando, et al.
Published: (2026)
STaR: Seamless Spatial-Temporal Aware Motion Retargeting with Penetration and Consistency Constraints
by: Yang, Xiaohang, et al.
Published: (2025)
by: Yang, Xiaohang, et al.
Published: (2025)
LaRender: Training-Free Occlusion Control in Image Generation via Latent Rendering
by: Zhan, Xiaohang, et al.
Published: (2025)
by: Zhan, Xiaohang, et al.
Published: (2025)
TraceVision: Trajectory-Aware Vision-Language Model for Human-Like Spatial Understanding
by: Yang, Fan, et al.
Published: (2026)
by: Yang, Fan, et al.
Published: (2026)
SpatialReasoner: Active Perception for Large-Scale 3D Scene Understanding
by: Zheng, Hongpei, et al.
Published: (2025)
by: Zheng, Hongpei, et al.
Published: (2025)
Spatial Preference Rewarding for MLLMs Spatial Understanding
by: Qiu, Han, et al.
Published: (2025)
by: Qiu, Han, et al.
Published: (2025)
STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction
by: Liao, Zhimin, et al.
Published: (2025)
by: Liao, Zhimin, et al.
Published: (2025)
From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space
by: Bu, Jiazi, et al.
Published: (2026)
by: Bu, Jiazi, et al.
Published: (2026)
Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
by: Xu, Runsen, et al.
Published: (2025)
by: Xu, Runsen, et al.
Published: (2025)
SpatialNav: Leveraging Spatial Scene Graphs for Zero-Shot Vision-and-Language Navigation
by: Zhang, Jiwen, et al.
Published: (2026)
by: Zhang, Jiwen, et al.
Published: (2026)
Agentic 3D Scene Generation with Spatially Contextualized VLMs
by: Liu, Xinhang, et al.
Published: (2025)
by: Liu, Xinhang, et al.
Published: (2025)
ColorMNet: A Memory-based Deep Spatial-Temporal Feature Propagation Network for Video Colorization
by: Yang, Yixin, et al.
Published: (2024)
by: Yang, Yixin, et al.
Published: (2024)
FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing
by: Wei, Tianyi, et al.
Published: (2025)
by: Wei, Tianyi, et al.
Published: (2025)
Scene Summarization: Clustering Scene Videos into Spatially Diverse Frames
by: Chen, Chao, et al.
Published: (2023)
by: Chen, Chao, et al.
Published: (2023)
SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes
by: Liu, Tianhui, et al.
Published: (2026)
by: Liu, Tianhui, et al.
Published: (2026)
NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving
by: Tian, Kexin, et al.
Published: (2025)
by: Tian, Kexin, et al.
Published: (2025)
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
by: Chen, Yongwei, et al.
Published: (2024)
by: Chen, Yongwei, et al.
Published: (2024)
SpatialBot: Precise Spatial Understanding with Vision Language Models
by: Cai, Wenxiao, et al.
Published: (2024)
by: Cai, Wenxiao, et al.
Published: (2024)
Enhancing Spatial Understanding in Image Generation via Reward Modeling
by: Tang, Zhenyu, et al.
Published: (2026)
by: Tang, Zhenyu, et al.
Published: (2026)
Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding?
by: Feng, Bo, et al.
Published: (2025)
by: Feng, Bo, et al.
Published: (2025)
SURPRISE3D: A Dataset for Spatial Understanding and Reasoning in Complex 3D Scenes
by: Huang, Jiaxin, et al.
Published: (2025)
by: Huang, Jiaxin, et al.
Published: (2025)
FreeQ-Graph: Free-form Querying with Semantic Consistent Scene Graph for 3D Scene Understanding
by: Zhan, Chenlu, et al.
Published: (2025)
by: Zhan, Chenlu, et al.
Published: (2025)
Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Spatial Reasoning
by: Tang, Yihong, et al.
Published: (2024)
by: Tang, Yihong, et al.
Published: (2024)
Spectral-Spatial Self-Supervised Learning for Few-Shot Hyperspectral Image Classification
by: Chen, Wenchen, et al.
Published: (2025)
by: Chen, Wenchen, et al.
Published: (2025)
Low-Light Video Enhancement via Spatial-Temporal Consistent Decomposition
by: Xu, Xiaogang, et al.
Published: (2024)
by: Xu, Xiaogang, et al.
Published: (2024)
Low-Light Video Enhancement with An Effective Spatial-Temporal Decomposition Paradigm
by: Xu, Xiaogang, et al.
Published: (2026)
by: Xu, Xiaogang, et al.
Published: (2026)
Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding
by: Mirjalili, Vahid, et al.
Published: (2025)
by: Mirjalili, Vahid, et al.
Published: (2025)
pySpatial: Generating 3D Visual Programs for Zero-Shot Spatial Reasoning
by: Luo, Zhanpeng, et al.
Published: (2026)
by: Luo, Zhanpeng, et al.
Published: (2026)
Scenario Understanding of Traffic Scenes Through Large Visual Language Models
by: Rivera, Esteban, et al.
Published: (2025)
by: Rivera, Esteban, et al.
Published: (2025)
SSR: Pushing the Limit of Spatial Intelligence with Structured Scene Reasoning
by: Zhang, Yi, et al.
Published: (2026)
by: Zhang, Yi, et al.
Published: (2026)
Semantic Foam: Unifying Spatial and Semantic Scene Decomposition
by: Sharafeldin, Amr, et al.
Published: (2026)
by: Sharafeldin, Amr, et al.
Published: (2026)
VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding
by: Shi, Jiapeng, et al.
Published: (2026)
by: Shi, Jiapeng, et al.
Published: (2026)
Spatial Chain-of-Thought: Bridging Understanding and Generation Models for Spatial Reasoning Generation
by: Chen, Wei, et al.
Published: (2026)
by: Chen, Wei, et al.
Published: (2026)
Similar Items
-
MTPano: Multi-Task Panoramic Scene Understanding via Label-Free Integration of Dense Prediction Priors
by: Zhang, Jingdong, et al.
Published: (2026) -
ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
by: Chen, Yongwei, et al.
Published: (2024) -
Enhancing MLLM Spatial Understanding via Active 3D Scene Exploration for Multi-Perspective Reasoning
by: Chen, Jiahua, et al.
Published: (2026) -
Deep Skin Lesion Segmentation with Transformer-CNN Fusion: Toward Intelligent Skin Cancer Analysis
by: Wang, Xin, et al.
Published: (2025) -
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding
by: Kong, Quan, et al.
Published: (2024)