Saved in:
| Main Authors: | Yuan, Zhihao, Jiang, Shuyi, Feng, Chun-Mei, Zhang, Yaolun, Cui, Shuguang, Li, Zhen, Zhao, Na |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.17545 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
by: Yuan, Zhihao, et al.
Published: (2023)
by: Yuan, Zhihao, et al.
Published: (2023)
Empowering Large Language Models with 3D Situation Awareness
by: Yuan, Zhihao, et al.
Published: (2025)
by: Yuan, Zhihao, et al.
Published: (2025)
GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians
by: Jiang, Shuyi, et al.
Published: (2024)
by: Jiang, Shuyi, et al.
Published: (2024)
R2G: Reasoning to Ground in 3D Scenes
by: Li, Yixuan, et al.
Published: (2024)
by: Li, Yixuan, et al.
Published: (2024)
RelTopo: Multi-Level Relational Modeling for Driving Scene Topology Reasoning
by: Luo, Yueru, et al.
Published: (2025)
by: Luo, Yueru, et al.
Published: (2025)
SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes
by: Linghu, Xiongkun, et al.
Published: (2025)
by: Linghu, Xiongkun, et al.
Published: (2025)
View-on-Graph: Zero-shot 3D Visual Grounding via Vision-Language Reasoning on Scene Graphs
by: Liu, Yuanyuan, et al.
Published: (2025)
by: Liu, Yuanyuan, et al.
Published: (2025)
Error-Driven Scene Editing for 3D Grounding in Large Language Models
by: Zhang, Yue, et al.
Published: (2025)
by: Zhang, Yue, et al.
Published: (2025)
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
by: Huang, Haifeng, et al.
Published: (2023)
by: Huang, Haifeng, et al.
Published: (2023)
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
by: Qi, Zhangyang, et al.
Published: (2025)
by: Qi, Zhangyang, et al.
Published: (2025)
LLplace: The 3D Indoor Scene Layout Generation and Editing via Large Language Model
by: Yang, Yixuan, et al.
Published: (2024)
by: Yang, Yixuan, et al.
Published: (2024)
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
by: Lyu, Ruiyuan, et al.
Published: (2024)
by: Lyu, Ruiyuan, et al.
Published: (2024)
Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes
by: Jiang, Lihan, et al.
Published: (2024)
by: Jiang, Lihan, et al.
Published: (2024)
Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs
by: Zhong, Yingji, et al.
Published: (2025)
by: Zhong, Yingji, et al.
Published: (2025)
Any 3D Scene is Worth 1K Tokens: 3D-Grounded Representation for Scene Generation at Scale
by: Wei, Dongxu, et al.
Published: (2026)
by: Wei, Dongxu, et al.
Published: (2026)
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
by: Jia, Baoxiong, et al.
Published: (2024)
by: Jia, Baoxiong, et al.
Published: (2024)
LSVG: Language-Guided Scene Graphs with 2D-Assisted Multi-Modal Encoding for 3D Visual Grounding
by: Xiao, Feng, et al.
Published: (2025)
by: Xiao, Feng, et al.
Published: (2025)
PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models
by: Guo, Zilu, et al.
Published: (2025)
by: Guo, Zilu, et al.
Published: (2025)
Multimodal 3D Reasoning Segmentation with Complex Scenes
by: Jiang, Xueying, et al.
Published: (2024)
by: Jiang, Xueying, et al.
Published: (2024)
Towards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object Detection
by: Zheng, Chaoda, et al.
Published: (2024)
by: Zheng, Chaoda, et al.
Published: (2024)
SceneTeller: Language-to-3D Scene Generation
by: Öcal, Başak Melis, et al.
Published: (2024)
by: Öcal, Başak Melis, et al.
Published: (2024)
Grounding by Remembering: Cross-Scene and In-Scene Memory for 3D Functional Affordances
by: Wang, Qirui, et al.
Published: (2026)
by: Wang, Qirui, et al.
Published: (2026)
SceneTeract: Agentic Functional Affordances and VLM Grounding in 3D Scenes
by: Maillard, Léopold, et al.
Published: (2026)
by: Maillard, Léopold, et al.
Published: (2026)
SceneGPT: A Language Model for 3D Scene Understanding
by: Chandhok, Shivam
Published: (2024)
by: Chandhok, Shivam
Published: (2024)
LSD-3D: Large-Scale 3D Driving Scene Generation with Geometry Grounding
by: Ost, Julian, et al.
Published: (2025)
by: Ost, Julian, et al.
Published: (2025)
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
by: Wang, Hanyang, et al.
Published: (2025)
by: Wang, Hanyang, et al.
Published: (2025)
ChangingGrounding: 3D Visual Grounding in Changing Scenes
by: Hu, Miao, et al.
Published: (2025)
by: Hu, Miao, et al.
Published: (2025)
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation
by: Dong, Wenqi, et al.
Published: (2025)
by: Dong, Wenqi, et al.
Published: (2025)
DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view Representation
by: Luo, Yueru, et al.
Published: (2024)
by: Luo, Yueru, et al.
Published: (2024)
Video Perception Models for 3D Scene Synthesis
by: Huang, Rui, et al.
Published: (2025)
by: Huang, Rui, et al.
Published: (2025)
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
by: Huang, Ting, et al.
Published: (2025)
by: Huang, Ting, et al.
Published: (2025)
SceneGlue: Scene-Aware Transformer for Feature Matching without Scene-Level Annotation
by: Du, Songlin, et al.
Published: (2026)
by: Du, Songlin, et al.
Published: (2026)
Flame3D: Zero-shot Compositional Reasoning of 3D Scenes with Agentic Language Models
by: Bharadwaj, Sagar, et al.
Published: (2026)
by: Bharadwaj, Sagar, et al.
Published: (2026)
Let Video Teaches You More: Video-to-Image Knowledge Distillation using DEtection TRansformer for Medical Video Lesion Detection
by: Jiang, Yuncheng, et al.
Published: (2024)
by: Jiang, Yuncheng, et al.
Published: (2024)
Timeliness-Fidelity Tradeoff in 3D Scene Representations
by: Xu, Xiangmin, et al.
Published: (2024)
by: Xu, Xiangmin, et al.
Published: (2024)
Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning
by: Fu, Rao, et al.
Published: (2024)
by: Fu, Rao, et al.
Published: (2024)
Unified Scene Representation and Reconstruction for 3D Large Language Models
by: Chu, Tao, et al.
Published: (2024)
by: Chu, Tao, et al.
Published: (2024)
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding
by: Wang, Yunsong, et al.
Published: (2024)
by: Wang, Yunsong, et al.
Published: (2024)
R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding
by: Wu, Qirui, et al.
Published: (2024)
by: Wu, Qirui, et al.
Published: (2024)
ArtiWorld: LLM-Driven Articulation of 3D Objects in Scenes
by: Yang, Yixuan, et al.
Published: (2025)
by: Yang, Yixuan, et al.
Published: (2025)
Similar Items
-
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
by: Yuan, Zhihao, et al.
Published: (2023) -
Empowering Large Language Models with 3D Situation Awareness
by: Yuan, Zhihao, et al.
Published: (2025) -
GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians
by: Jiang, Shuyi, et al.
Published: (2024) -
R2G: Reasoning to Ground in 3D Scenes
by: Li, Yixuan, et al.
Published: (2024) -
RelTopo: Multi-Level Relational Modeling for Driving Scene Topology Reasoning
by: Luo, Yueru, et al.
Published: (2025)