Saved in:
| Main Authors: | Wang, Puyi, Wang, Yuhao, Li, Linjie, Yang, Zhengyuan, Lin, Kevin Qinghong, Li, Yangguang, Cheng, Yu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.19587 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GenXD: Generating Any 3D and 4D Scenes
by: Zhao, Yuyang, et al.
Published: (2024)
by: Zhao, Yuyang, et al.
Published: (2024)
Planning with the Views via Scene Self-Exploration
by: Wang, Kangrui, et al.
Published: (2026)
by: Wang, Kangrui, et al.
Published: (2026)
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
by: Lin, Kevin Qinghong, et al.
Published: (2024)
by: Lin, Kevin Qinghong, et al.
Published: (2024)
Code2World: A GUI World Model via Renderable Code Generation
by: Zheng, Yuhao, et al.
Published: (2026)
by: Zheng, Yuhao, et al.
Published: (2026)
Articulated Object Interaction in Unknown Scenes with Whole-Body Mobile Manipulation
by: Mittal, Mayank, et al.
Published: (2021)
by: Mittal, Mayank, et al.
Published: (2021)
HetScene: Heterogeneity-Aware Diffusion for Dense Indoor Scene Generation
by: Chen, Zini, et al.
Published: (2026)
by: Chen, Zini, et al.
Published: (2026)
Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation
by: Zhang, Jihai, et al.
Published: (2025)
by: Zhang, Jihai, et al.
Published: (2025)
Articulated 3D Scene Graphs for Open-World Mobile Manipulation
by: Büchner, Martin, et al.
Published: (2026)
by: Büchner, Martin, et al.
Published: (2026)
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
by: Lin, Kevin Qinghong, et al.
Published: (2024)
by: Lin, Kevin Qinghong, et al.
Published: (2024)
Autonomous Implicit Indoor Scene Reconstruction with Frontier Exploration
by: Zeng, Jing, et al.
Published: (2024)
by: Zeng, Jing, et al.
Published: (2024)
CustomTex: High-fidelity Indoor Scene Texturing via Multi-Reference Customization
by: Chen, Weilin, et al.
Published: (2026)
by: Chen, Weilin, et al.
Published: (2026)
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
by: Yu, Weihao, et al.
Published: (2023)
by: Yu, Weihao, et al.
Published: (2023)
DOCTR: Disentangled Object-Centric Transformer for Point Scene Understanding
by: Yu, Xiaoxuan, et al.
Published: (2024)
by: Yu, Xiaoxuan, et al.
Published: (2024)
The Scene Language: Representing Scenes with Programs, Words, and Embeddings
by: Zhang, Yunzhi, et al.
Published: (2024)
by: Zhang, Yunzhi, et al.
Published: (2024)
SceneSmith: Agentic Generation of Simulation-Ready Indoor Scenes
by: Pfaff, Nicholas, et al.
Published: (2026)
by: Pfaff, Nicholas, et al.
Published: (2026)
RoomCraft: Controllable and Complete 3D Indoor Scene Generation
by: Zhou, Mengqi, et al.
Published: (2025)
by: Zhou, Mengqi, et al.
Published: (2025)
UniHM: Universal Human Motion Generation with Object Interactions in Indoor Scenes
by: Geng, Zichen, et al.
Published: (2025)
by: Geng, Zichen, et al.
Published: (2025)
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
by: Zhai, Guangyao, et al.
Published: (2024)
by: Zhai, Guangyao, et al.
Published: (2024)
TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning
by: Feng, ZhiYuan, et al.
Published: (2026)
by: Feng, ZhiYuan, et al.
Published: (2026)
DisCo: Disentangled Control for Realistic Human Dance Generation
by: Wang, Tan, et al.
Published: (2023)
by: Wang, Tan, et al.
Published: (2023)
MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models
by: Fang, Shaoheng, et al.
Published: (2025)
by: Fang, Shaoheng, et al.
Published: (2025)
ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting
by: Zhu, Ruijie, et al.
Published: (2025)
by: Zhu, Ruijie, et al.
Published: (2025)
SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation
by: Tang, Song, et al.
Published: (2026)
by: Tang, Song, et al.
Published: (2026)
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
by: Yu, Weihao, et al.
Published: (2024)
by: Yu, Weihao, et al.
Published: (2024)
Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification
by: Barros, Artur, et al.
Published: (2025)
by: Barros, Artur, et al.
Published: (2025)
Gen4D: Synthesizing Humans and Scenes in the Wild
by: Bright, Jerrin, et al.
Published: (2025)
by: Bright, Jerrin, et al.
Published: (2025)
SAGE: Scene Graph-Aware Guidance and Execution for Long-Horizon Manipulation Tasks
by: Li, Jialiang, et al.
Published: (2025)
by: Li, Jialiang, et al.
Published: (2025)
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
by: He, Xuehai, et al.
Published: (2024)
by: He, Xuehai, et al.
Published: (2024)
AdaManip: Adaptive Articulated Object Manipulation Environments and Policy Learning
by: Wang, Yuanfei, et al.
Published: (2025)
by: Wang, Yuanfei, et al.
Published: (2025)
Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning
by: Ran, Xingjian, et al.
Published: (2025)
by: Ran, Xingjian, et al.
Published: (2025)
Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs
by: Xia, Wenke, et al.
Published: (2023)
by: Xia, Wenke, et al.
Published: (2023)
GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scenes
by: Chen, Xiao, et al.
Published: (2025)
by: Chen, Xiao, et al.
Published: (2025)
Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
by: Wang, Yaoting, et al.
Published: (2024)
by: Wang, Yaoting, et al.
Published: (2024)
RPMArt: Towards Robust Perception and Manipulation for Articulated Objects
by: Wang, Junbo, et al.
Published: (2024)
by: Wang, Junbo, et al.
Published: (2024)
EvoScene-VLA: Evolving Scene Beliefs Inside the Action Decoder for Chunked Robot Control
by: Zhang, Chushan, et al.
Published: (2026)
by: Zhang, Chushan, et al.
Published: (2026)
Editable Concept Bottleneck Models
by: Hu, Lijie, et al.
Published: (2024)
by: Hu, Lijie, et al.
Published: (2024)
Code2Video: A Code-centric Paradigm for Educational Video Generation
by: Chen, Yanzhe, et al.
Published: (2025)
by: Chen, Yanzhe, et al.
Published: (2025)
SAEC: Scene-Aware Enhanced Edge-Cloud Collaborative Industrial Vision Inspection with Multimodal LLM
by: Tian, Yuhao, et al.
Published: (2025)
by: Tian, Yuhao, et al.
Published: (2025)
VeriGraph: Scene Graphs for Execution Verifiable Robot Planning
by: Ekpo, Daniel, et al.
Published: (2024)
by: Ekpo, Daniel, et al.
Published: (2024)
ScenePilot: Controllable Boundary-Driven Critical Scenario Generation for Autonomous Driving
by: Ruan, Qiyu, et al.
Published: (2026)
by: Ruan, Qiyu, et al.
Published: (2026)
Similar Items
-
GenXD: Generating Any 3D and 4D Scenes
by: Zhao, Yuyang, et al.
Published: (2024) -
Planning with the Views via Scene Self-Exploration
by: Wang, Kangrui, et al.
Published: (2026) -
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
by: Lin, Kevin Qinghong, et al.
Published: (2024) -
Code2World: A GUI World Model via Renderable Code Generation
by: Zheng, Yuhao, et al.
Published: (2026) -
Articulated Object Interaction in Unknown Scenes with Whole-Body Mobile Manipulation
by: Mittal, Mayank, et al.
Published: (2021)