Saved in:
| Main Authors: | Zheng, Kaizhi, Zha, Ruijian, Xu, Zishuo, Gu, Jing, Yang, Jie, Wang, Xin Eric |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.15765 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Self-Evolving 3D Scene Generation from a Single Image
by: Zheng, Kaizhi, et al.
Published: (2025)
by: Zheng, Kaizhi, et al.
Published: (2025)
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens
by: Zheng, Kaizhi, et al.
Published: (2023)
by: Zheng, Kaizhi, et al.
Published: (2023)
BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View
by: Yang, Yuxiang, et al.
Published: (2023)
by: Yang, Yuxiang, et al.
Published: (2023)
Grounding 3D Scene Affordance From Egocentric Interactions
by: Liu, Cuiyu, et al.
Published: (2024)
by: Liu, Cuiyu, et al.
Published: (2024)
EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
by: Zheng, Kaizhi, et al.
Published: (2024)
by: Zheng, Kaizhi, et al.
Published: (2024)
Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model
by: Zhang, Shengjun, et al.
Published: (2025)
by: Zhang, Shengjun, et al.
Published: (2025)
NavCrafter: Exploring 3D Scenes from a Single Image
by: Duan, Hongbo, et al.
Published: (2026)
by: Duan, Hongbo, et al.
Published: (2026)
GEN3D: Generating Domain-Free 3D Scenes from a Single Image
by: Zhang, Yuxin, et al.
Published: (2025)
by: Zhang, Yuxin, et al.
Published: (2025)
JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
by: Zheng, Kaizhi, et al.
Published: (2022)
by: Zheng, Kaizhi, et al.
Published: (2022)
SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass
by: Meng, Yanxu, et al.
Published: (2025)
by: Meng, Yanxu, et al.
Published: (2025)
MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator
by: He, Xuehai, et al.
Published: (2025)
by: He, Xuehai, et al.
Published: (2025)
GRIT: Teaching MLLMs to Think with Images
by: Fan, Yue, et al.
Published: (2025)
by: Fan, Yue, et al.
Published: (2025)
ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary
by: Gu, Zeqi, et al.
Published: (2025)
by: Gu, Zeqi, et al.
Published: (2025)
WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions
by: Li, Zizhang, et al.
Published: (2025)
by: Li, Zizhang, et al.
Published: (2025)
Elucidating and Endowing the Diffusion Training Paradigm for General Image Restoration
by: Lu, Xin, et al.
Published: (2025)
by: Lu, Xin, et al.
Published: (2025)
3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation
by: Zhang, Frank, et al.
Published: (2024)
by: Zhang, Frank, et al.
Published: (2024)
InteractMove: Text-Controlled Human-Object Interaction Generation in 3D Scenes with Movable Objects
by: Cai, Xinhao, et al.
Published: (2025)
by: Cai, Xinhao, et al.
Published: (2025)
Feature-Optimized Vision for Adaptive 3D Scene Reconstruction
by: Liang, Eric
Published: (2026)
by: Liang, Eric
Published: (2026)
Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
by: Qian, Ming, et al.
Published: (2026)
by: Qian, Ming, et al.
Published: (2026)
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
by: Sun, Wenqiang, et al.
Published: (2024)
by: Sun, Wenqiang, et al.
Published: (2024)
Beyond Voxel 3D Editing: Learning from 3D Masks and Self-Constructed Data
by: Xu, Yizhao, et al.
Published: (2026)
by: Xu, Yizhao, et al.
Published: (2026)
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
by: Huang, Yuanhui, et al.
Published: (2024)
by: Huang, Yuanhui, et al.
Published: (2024)
Compress3D: a Compressed Latent Space for 3D Generation from a Single Image
by: Zhang, Bowen, et al.
Published: (2024)
by: Zhang, Bowen, et al.
Published: (2024)
LLMI3D: MLLM-based 3D Perception from a Single 2D Image
by: Yang, Fan, et al.
Published: (2024)
by: Yang, Fan, et al.
Published: (2024)
VASA-3D: Lifelike Audio-Driven Gaussian Head Avatars from a Single Image
by: Xu, Sicheng, et al.
Published: (2025)
by: Xu, Sicheng, et al.
Published: (2025)
GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding
by: Shao, Yawen, et al.
Published: (2024)
by: Shao, Yawen, et al.
Published: (2024)
DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features
by: Wang, Letian, et al.
Published: (2024)
by: Wang, Letian, et al.
Published: (2024)
Wonder3D++: Cross-domain Diffusion for High-fidelity 3D Generation from a Single Image
by: Yang, Yuxiao, et al.
Published: (2025)
by: Yang, Yuxiao, et al.
Published: (2025)
Hunyuan3D 1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation
by: Yang, Xianghui, et al.
Published: (2024)
by: Yang, Xianghui, et al.
Published: (2024)
NOVA3D: Normal Aligned Video Diffusion Model for Single Image to 3D Generation
by: Yang, Yuxiao, et al.
Published: (2025)
by: Yang, Yuxiao, et al.
Published: (2025)
Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models
by: Xu, Yifan, et al.
Published: (2025)
by: Xu, Yifan, et al.
Published: (2025)
Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data
by: Feng, Tuo, et al.
Published: (2024)
by: Feng, Tuo, et al.
Published: (2024)
FurniScene: A Large-scale 3D Room Dataset with Intricate Furnishing Scenes
by: Zhang, Genghao, et al.
Published: (2024)
by: Zhang, Genghao, et al.
Published: (2024)
NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows
by: Tang, Zhenggang, et al.
Published: (2024)
by: Tang, Zhenggang, et al.
Published: (2024)
SceneTok: A Compressed, Diffusable Token Space for 3D Scenes
by: Asim, Mohammad, et al.
Published: (2026)
by: Asim, Mohammad, et al.
Published: (2026)
GenXD: Generating Any 3D and 4D Scenes
by: Zhao, Yuyang, et al.
Published: (2024)
by: Zhao, Yuyang, et al.
Published: (2024)
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
by: Zhou, Shijie, et al.
Published: (2024)
by: Zhou, Shijie, et al.
Published: (2024)
LatentEditor: Text Driven Local Editing of 3D Scenes
by: Khalid, Umar, et al.
Published: (2023)
by: Khalid, Umar, et al.
Published: (2023)
Text-Scene: A Scene-to-Language Parsing Framework for 3D Scene Understanding
by: Li, Haoyuan, et al.
Published: (2025)
by: Li, Haoyuan, et al.
Published: (2025)
EXPLORE-Bench: Egocentric Scene Prediction with Long-Horizon Reasoning
by: Yu, Chengjun, et al.
Published: (2026)
by: Yu, Chengjun, et al.
Published: (2026)
Similar Items
-
Self-Evolving 3D Scene Generation from a Single Image
by: Zheng, Kaizhi, et al.
Published: (2025) -
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens
by: Zheng, Kaizhi, et al.
Published: (2023) -
BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View
by: Yang, Yuxiang, et al.
Published: (2023) -
Grounding 3D Scene Affordance From Egocentric Interactions
by: Liu, Cuiyu, et al.
Published: (2024) -
EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
by: Zheng, Kaizhi, et al.
Published: (2024)