:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Puyi, Wang, Yuhao, Li, Linjie, Yang, Zhengyuan, Lin, Kevin Qinghong, Li, Yangguang, Cheng, Yu
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.19587
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GenXD: Generating Any 3D and 4D Scenes
by: Zhao, Yuyang, et al.
Published: (2024)

Planning with the Views via Scene Self-Exploration
by: Wang, Kangrui, et al.
Published: (2026)

VideoGUI: A Benchmark for GUI Automation from Instructional Videos
by: Lin, Kevin Qinghong, et al.
Published: (2024)

Code2World: A GUI World Model via Renderable Code Generation
by: Zheng, Yuhao, et al.
Published: (2026)

Articulated Object Interaction in Unknown Scenes with Whole-Body Mobile Manipulation
by: Mittal, Mayank, et al.
Published: (2021)

HetScene: Heterogeneity-Aware Diffusion for Dense Indoor Scene Generation
by: Chen, Zini, et al.
Published: (2026)

Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation
by: Zhang, Jihai, et al.
Published: (2025)

Articulated 3D Scene Graphs for Open-World Mobile Manipulation
by: Büchner, Martin, et al.
Published: (2026)

ShowUI: One Vision-Language-Action Model for GUI Visual Agent
by: Lin, Kevin Qinghong, et al.
Published: (2024)

Autonomous Implicit Indoor Scene Reconstruction with Frontier Exploration
by: Zeng, Jing, et al.
Published: (2024)

CustomTex: High-fidelity Indoor Scene Texturing via Multi-Reference Customization
by: Chen, Weilin, et al.
Published: (2026)

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
by: Yu, Weihao, et al.
Published: (2023)

DOCTR: Disentangled Object-Centric Transformer for Point Scene Understanding
by: Yu, Xiaoxuan, et al.
Published: (2024)

The Scene Language: Representing Scenes with Programs, Words, and Embeddings
by: Zhang, Yunzhi, et al.
Published: (2024)

SceneSmith: Agentic Generation of Simulation-Ready Indoor Scenes
by: Pfaff, Nicholas, et al.
Published: (2026)

RoomCraft: Controllable and Complete 3D Indoor Scene Generation
by: Zhou, Mengqi, et al.
Published: (2025)

UniHM: Universal Human Motion Generation with Object Interactions in Indoor Scenes
by: Geng, Zichen, et al.
Published: (2025)

EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
by: Zhai, Guangyao, et al.
Published: (2024)

TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning
by: Feng, ZhiYuan, et al.
Published: (2026)

DisCo: Disentangled Control for Realistic Human Dance Generation
by: Wang, Tan, et al.
Published: (2023)

MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models
by: Fang, Shaoheng, et al.
Published: (2025)

ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting
by: Zhu, Ruijie, et al.
Published: (2025)

SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation
by: Tang, Song, et al.
Published: (2026)

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
by: Yu, Weihao, et al.
Published: (2024)

Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification
by: Barros, Artur, et al.
Published: (2025)

Gen4D: Synthesizing Humans and Scenes in the Wild
by: Bright, Jerrin, et al.
Published: (2025)

SAGE: Scene Graph-Aware Guidance and Execution for Long-Horizon Manipulation Tasks
by: Li, Jialiang, et al.
Published: (2025)

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
by: He, Xuehai, et al.
Published: (2024)

AdaManip: Adaptive Articulated Object Manipulation Environments and Policy Learning
by: Wang, Yuanfei, et al.
Published: (2025)

Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning
by: Ran, Xingjian, et al.
Published: (2025)

Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs
by: Xia, Wenke, et al.
Published: (2023)

GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scenes
by: Chen, Xiao, et al.
Published: (2025)

Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
by: Wang, Yaoting, et al.
Published: (2024)

RPMArt: Towards Robust Perception and Manipulation for Articulated Objects
by: Wang, Junbo, et al.
Published: (2024)

EvoScene-VLA: Evolving Scene Beliefs Inside the Action Decoder for Chunked Robot Control
by: Zhang, Chushan, et al.
Published: (2026)

Editable Concept Bottleneck Models
by: Hu, Lijie, et al.
Published: (2024)

Code2Video: A Code-centric Paradigm for Educational Video Generation
by: Chen, Yanzhe, et al.
Published: (2025)

SAEC: Scene-Aware Enhanced Edge-Cloud Collaborative Industrial Vision Inspection with Multimodal LLM
by: Tian, Yuhao, et al.
Published: (2025)

VeriGraph: Scene Graphs for Execution Verifiable Robot Planning
by: Ekpo, Daniel, et al.
Published: (2024)

ScenePilot: Controllable Boundary-Driven Critical Scenario Generation for Autonomous Driving
by: Ruan, Qiyu, et al.
Published: (2026)