MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Duan, Yinglin, Zou, Zhengxia, Gu, Tongwei, Jia, Wei, Zhao, Zhan, Xu, Luyi, Liu, Xinzhu, Lin, Yenan, Jiang, Hao, Chen, Kang, Qiu, Shuang
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Artificial Intelligence Computer Vision and Pattern Recognition Machine Learning
Accesso online:	https://arxiv.org/abs/2509.05263
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866911142390005760
author	Duan, Yinglin Zou, Zhengxia Gu, Tongwei Jia, Wei Zhao, Zhan Xu, Luyi Liu, Xinzhu Lin, Yenan Jiang, Hao Chen, Kang Qiu, Shuang
author_facet	Duan, Yinglin Zou, Zhengxia Gu, Tongwei Jia, Wei Zhao, Zhan Xu, Luyi Liu, Xinzhu Lin, Yenan Jiang, Hao Chen, Kang Qiu, Shuang
contents	Recent research has been increasingly focusing on developing 3D world models that simulate complex real-world scenarios. World models have found broad applications across various domains, including embodied AI, autonomous driving, entertainment, etc. A more realistic simulation with accurate physics will effectively narrow the sim-to-real gap and allow us to gather rich information about the real world conveniently. While traditional manual modeling has enabled the creation of virtual 3D scenes, modern approaches have leveraged advanced machine learning algorithms for 3D world generation, with most recent advances focusing on generative methods that can create virtual worlds based on user instructions. This work explores such a research direction by proposing LatticeWorld, a simple yet effective 3D world generation framework that streamlines the industrial production pipeline of 3D environments. LatticeWorld leverages lightweight LLMs (LLaMA-2-7B) alongside the industry-grade rendering engine (e.g., Unreal Engine 5) to generate a dynamic environment. Our proposed framework accepts textual descriptions and visual instructions as multimodal inputs and creates large-scale 3D interactive worlds with dynamic agents, featuring competitive multi-agent interaction, high-fidelity physics simulation, and real-time rendering. We conduct comprehensive experiments to evaluate LatticeWorld, showing that it achieves superior accuracy in scene layout generation and visual fidelity. Moreover, LatticeWorld achieves over a $90\times$ increase in industrial production efficiency while maintaining high creative quality compared with traditional manual production methods. Our demo video is available at https://youtu.be/8VWZXpERR18
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_05263
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation Duan, Yinglin Zou, Zhengxia Gu, Tongwei Jia, Wei Zhao, Zhan Xu, Luyi Liu, Xinzhu Lin, Yenan Jiang, Hao Chen, Kang Qiu, Shuang Artificial Intelligence Computer Vision and Pattern Recognition Machine Learning Recent research has been increasingly focusing on developing 3D world models that simulate complex real-world scenarios. World models have found broad applications across various domains, including embodied AI, autonomous driving, entertainment, etc. A more realistic simulation with accurate physics will effectively narrow the sim-to-real gap and allow us to gather rich information about the real world conveniently. While traditional manual modeling has enabled the creation of virtual 3D scenes, modern approaches have leveraged advanced machine learning algorithms for 3D world generation, with most recent advances focusing on generative methods that can create virtual worlds based on user instructions. This work explores such a research direction by proposing LatticeWorld, a simple yet effective 3D world generation framework that streamlines the industrial production pipeline of 3D environments. LatticeWorld leverages lightweight LLMs (LLaMA-2-7B) alongside the industry-grade rendering engine (e.g., Unreal Engine 5) to generate a dynamic environment. Our proposed framework accepts textual descriptions and visual instructions as multimodal inputs and creates large-scale 3D interactive worlds with dynamic agents, featuring competitive multi-agent interaction, high-fidelity physics simulation, and real-time rendering. We conduct comprehensive experiments to evaluate LatticeWorld, showing that it achieves superior accuracy in scene layout generation and visual fidelity. Moreover, LatticeWorld achieves over a $90\times$ increase in industrial production efficiency while maintaining high creative quality compared with traditional manual production methods. Our demo video is available at https://youtu.be/8VWZXpERR18
title	LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation
topic	Artificial Intelligence Computer Vision and Pattern Recognition Machine Learning
url	https://arxiv.org/abs/2509.05263

Documenti analoghi