_version_ 1866913987480780800
author HunyuanWorld Team
Wang, Zhenwei
Liu, Yuhao
Wu, Junta
Gu, Zixiao
Wang, Haoyuan
Zuo, Xuhui
Huang, Tianyu
Li, Wenhuan
Zhang, Sheng
Lian, Yihang
Tsai, Yulin
Wang, Lifu
Liu, Sicong
Jiang, Puhua
Yang, Xianghui
Guo, Dongyuan
Tang, Yixuan
Mao, Xinyue
Yu, Jiaao
Yu, Junlin
Zhang, Jihong
Chen, Meng
Dong, Liang
Jia, Yiwen
Zhang, Chao
Tan, Yonghao
Zhang, Hao
Ye, Zheng
He, Peng
Wu, Runzhou
Chen, Minghui
Li, Zhan
Qin, Wangchen
Wang, Lei
Sun, Yifu
Niu, Lin
Yuan, Xiang
Yang, Xiaofeng
He, Yingping
Xiao, Jie
Tao, Yangyu
Zhu, Jianchen
Xue, Jinbao
Liu, Kai
Zhao, Chongqing
Wu, Xinming
Liu, Tian
Chen, Peng
Wang, Di
Liu, Yuhong
Linus
Jiang, Jie
Wang, Tengfei
Guo, Chunchao
author_facet HunyuanWorld Team
Wang, Zhenwei
Liu, Yuhao
Wu, Junta
Gu, Zixiao
Wang, Haoyuan
Zuo, Xuhui
Huang, Tianyu
Li, Wenhuan
Zhang, Sheng
Lian, Yihang
Tsai, Yulin
Wang, Lifu
Liu, Sicong
Jiang, Puhua
Yang, Xianghui
Guo, Dongyuan
Tang, Yixuan
Mao, Xinyue
Yu, Jiaao
Yu, Junlin
Zhang, Jihong
Chen, Meng
Dong, Liang
Jia, Yiwen
Zhang, Chao
Tan, Yonghao
Zhang, Hao
Ye, Zheng
He, Peng
Wu, Runzhou
Chen, Minghui
Li, Zhan
Qin, Wangchen
Wang, Lei
Sun, Yifu
Niu, Lin
Yuan, Xiang
Yang, Xiaofeng
He, Yingping
Xiao, Jie
Tao, Yangyu
Zhu, Jianchen
Xue, Jinbao
Liu, Kai
Zhao, Chongqing
Wu, Xinming
Liu, Tian
Chen, Peng
Wang, Di
Liu, Yuhong
Linus
Jiang, Jie
Wang, Tengfei
Guo, Chunchao
contents Creating immersive and playable 3D worlds from texts or images remains a fundamental challenge in computer vision and graphics. Existing world generation approaches typically fall into two categories: video-based methods that offer rich diversity but lack 3D consistency and rendering efficiency, and 3D-based methods that provide geometric consistency but struggle with limited training data and memory-inefficient representations. To address these limitations, we present HunyuanWorld 1.0, a novel framework that combines the best of both worlds for generating immersive, explorable, and interactive 3D scenes from text and image conditions. Our approach features three key advantages: 1) 360° immersive experiences via panoramic world proxies; 2) mesh export capabilities for seamless compatibility with existing computer graphics pipelines; 3) disentangled object representations for augmented interactivity. The core of our framework is a semantically layered 3D mesh representation that leverages panoramic images as 360° world proxies for semantic-aware world decomposition and reconstruction, enabling the generation of diverse 3D worlds. Extensive experiments demonstrate that our method achieves state-of-the-art performance in generating coherent, explorable, and interactive 3D worlds while enabling versatile applications in virtual reality, physical simulation, game development, and interactive content creation.
format Preprint
id arxiv_https___arxiv_org_abs_2507_21809
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
HunyuanWorld Team
Wang, Zhenwei
Liu, Yuhao
Wu, Junta
Gu, Zixiao
Wang, Haoyuan
Zuo, Xuhui
Huang, Tianyu
Li, Wenhuan
Zhang, Sheng
Lian, Yihang
Tsai, Yulin
Wang, Lifu
Liu, Sicong
Jiang, Puhua
Yang, Xianghui
Guo, Dongyuan
Tang, Yixuan
Mao, Xinyue
Yu, Jiaao
Yu, Junlin
Zhang, Jihong
Chen, Meng
Dong, Liang
Jia, Yiwen
Zhang, Chao
Tan, Yonghao
Zhang, Hao
Ye, Zheng
He, Peng
Wu, Runzhou
Chen, Minghui
Li, Zhan
Qin, Wangchen
Wang, Lei
Sun, Yifu
Niu, Lin
Yuan, Xiang
Yang, Xiaofeng
He, Yingping
Xiao, Jie
Tao, Yangyu
Zhu, Jianchen
Xue, Jinbao
Liu, Kai
Zhao, Chongqing
Wu, Xinming
Liu, Tian
Chen, Peng
Wang, Di
Liu, Yuhong
Linus
Jiang, Jie
Wang, Tengfei
Guo, Chunchao
Computer Vision and Pattern Recognition
Creating immersive and playable 3D worlds from texts or images remains a fundamental challenge in computer vision and graphics. Existing world generation approaches typically fall into two categories: video-based methods that offer rich diversity but lack 3D consistency and rendering efficiency, and 3D-based methods that provide geometric consistency but struggle with limited training data and memory-inefficient representations. To address these limitations, we present HunyuanWorld 1.0, a novel framework that combines the best of both worlds for generating immersive, explorable, and interactive 3D scenes from text and image conditions. Our approach features three key advantages: 1) 360° immersive experiences via panoramic world proxies; 2) mesh export capabilities for seamless compatibility with existing computer graphics pipelines; 3) disentangled object representations for augmented interactivity. The core of our framework is a semantically layered 3D mesh representation that leverages panoramic images as 360° world proxies for semantic-aware world decomposition and reconstruction, enabling the generation of diverse 3D worlds. Extensive experiments demonstrate that our method achieves state-of-the-art performance in generating coherent, explorable, and interactive 3D worlds while enabling versatile applications in virtual reality, physical simulation, game development, and interactive content creation.
title HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2507.21809