Saved in:
Bibliographic Details
Main Authors: Zhang, Yi, Wang, Yunshuang, Zhang, Zeyu, Tang, Hao
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.11757
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912899425894400
author Zhang, Yi
Wang, Yunshuang
Zhang, Zeyu
Tang, Hao
author_facet Zhang, Yi
Wang, Yunshuang
Zhang, Zeyu
Tang, Hao
contents Achieving spatial intelligence requires moving beyond visual plausibility to build world simulators grounded in physical laws. While coding LLMs have advanced static 3D scene generation, extending this paradigm to 4D dynamics remains a critical frontier. This task presents two fundamental challenges: multi-scale context entanglement, where monolithic generation fails to balance local object structures with global environmental layouts; and a semantic-physical execution gap, where open-loop code generation leads to physical hallucinations lacking dynamic fidelity. We introduce Code2Worlds, a framework that formulates 4D generation as language-to-simulation code generation. First, we propose a dual-stream architecture that disentangles retrieval-augmented object generation from hierarchical environmental orchestration. Second, to ensure dynamic fidelity, we establish a physics-aware closed-loop mechanism in which a PostProcess Agent scripts dynamics, coupled with a VLM-Motion Critic that performs self-reflection to iteratively refine simulation code. Evaluations on the Code4D benchmark show Code2Worlds outperforms baselines with a 41% SGS gain and 49% higher Richness, while uniquely generating physics-aware dynamics absent in prior static methods. Code: https://github.com/AIGeeksGroup/Code2Worlds. Website: https://aigeeksgroup.github.io/Code2Worlds.
format Preprint
id arxiv_https___arxiv_org_abs_2602_11757
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Code2Worlds: Empowering Coding LLMs for 4D World Generation
Zhang, Yi
Wang, Yunshuang
Zhang, Zeyu
Tang, Hao
Computer Vision and Pattern Recognition
Achieving spatial intelligence requires moving beyond visual plausibility to build world simulators grounded in physical laws. While coding LLMs have advanced static 3D scene generation, extending this paradigm to 4D dynamics remains a critical frontier. This task presents two fundamental challenges: multi-scale context entanglement, where monolithic generation fails to balance local object structures with global environmental layouts; and a semantic-physical execution gap, where open-loop code generation leads to physical hallucinations lacking dynamic fidelity. We introduce Code2Worlds, a framework that formulates 4D generation as language-to-simulation code generation. First, we propose a dual-stream architecture that disentangles retrieval-augmented object generation from hierarchical environmental orchestration. Second, to ensure dynamic fidelity, we establish a physics-aware closed-loop mechanism in which a PostProcess Agent scripts dynamics, coupled with a VLM-Motion Critic that performs self-reflection to iteratively refine simulation code. Evaluations on the Code4D benchmark show Code2Worlds outperforms baselines with a 41% SGS gain and 49% higher Richness, while uniquely generating physics-aware dynamics absent in prior static methods. Code: https://github.com/AIGeeksGroup/Code2Worlds. Website: https://aigeeksgroup.github.io/Code2Worlds.
title Code2Worlds: Empowering Coding LLMs for 4D World Generation
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2602.11757