Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.21809 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866913987480780800 |
|---|---|
| author | HunyuanWorld Team Wang, Zhenwei Liu, Yuhao Wu, Junta Gu, Zixiao Wang, Haoyuan Zuo, Xuhui Huang, Tianyu Li, Wenhuan Zhang, Sheng Lian, Yihang Tsai, Yulin Wang, Lifu Liu, Sicong Jiang, Puhua Yang, Xianghui Guo, Dongyuan Tang, Yixuan Mao, Xinyue Yu, Jiaao Yu, Junlin Zhang, Jihong Chen, Meng Dong, Liang Jia, Yiwen Zhang, Chao Tan, Yonghao Zhang, Hao Ye, Zheng He, Peng Wu, Runzhou Chen, Minghui Li, Zhan Qin, Wangchen Wang, Lei Sun, Yifu Niu, Lin Yuan, Xiang Yang, Xiaofeng He, Yingping Xiao, Jie Tao, Yangyu Zhu, Jianchen Xue, Jinbao Liu, Kai Zhao, Chongqing Wu, Xinming Liu, Tian Chen, Peng Wang, Di Liu, Yuhong Linus Jiang, Jie Wang, Tengfei Guo, Chunchao |
| author_facet | HunyuanWorld Team Wang, Zhenwei Liu, Yuhao Wu, Junta Gu, Zixiao Wang, Haoyuan Zuo, Xuhui Huang, Tianyu Li, Wenhuan Zhang, Sheng Lian, Yihang Tsai, Yulin Wang, Lifu Liu, Sicong Jiang, Puhua Yang, Xianghui Guo, Dongyuan Tang, Yixuan Mao, Xinyue Yu, Jiaao Yu, Junlin Zhang, Jihong Chen, Meng Dong, Liang Jia, Yiwen Zhang, Chao Tan, Yonghao Zhang, Hao Ye, Zheng He, Peng Wu, Runzhou Chen, Minghui Li, Zhan Qin, Wangchen Wang, Lei Sun, Yifu Niu, Lin Yuan, Xiang Yang, Xiaofeng He, Yingping Xiao, Jie Tao, Yangyu Zhu, Jianchen Xue, Jinbao Liu, Kai Zhao, Chongqing Wu, Xinming Liu, Tian Chen, Peng Wang, Di Liu, Yuhong Linus Jiang, Jie Wang, Tengfei Guo, Chunchao |
| contents | Creating immersive and playable 3D worlds from texts or images remains a fundamental challenge in computer vision and graphics. Existing world generation approaches typically fall into two categories: video-based methods that offer rich diversity but lack 3D consistency and rendering efficiency, and 3D-based methods that provide geometric consistency but struggle with limited training data and memory-inefficient representations. To address these limitations, we present HunyuanWorld 1.0, a novel framework that combines the best of both worlds for generating immersive, explorable, and interactive 3D scenes from text and image conditions. Our approach features three key advantages: 1) 360° immersive experiences via panoramic world proxies; 2) mesh export capabilities for seamless compatibility with existing computer graphics pipelines; 3) disentangled object representations for augmented interactivity. The core of our framework is a semantically layered 3D mesh representation that leverages panoramic images as 360° world proxies for semantic-aware world decomposition and reconstruction, enabling the generation of diverse 3D worlds. Extensive experiments demonstrate that our method achieves state-of-the-art performance in generating coherent, explorable, and interactive 3D worlds while enabling versatile applications in virtual reality, physical simulation, game development, and interactive content creation. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2507_21809 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels HunyuanWorld Team Wang, Zhenwei Liu, Yuhao Wu, Junta Gu, Zixiao Wang, Haoyuan Zuo, Xuhui Huang, Tianyu Li, Wenhuan Zhang, Sheng Lian, Yihang Tsai, Yulin Wang, Lifu Liu, Sicong Jiang, Puhua Yang, Xianghui Guo, Dongyuan Tang, Yixuan Mao, Xinyue Yu, Jiaao Yu, Junlin Zhang, Jihong Chen, Meng Dong, Liang Jia, Yiwen Zhang, Chao Tan, Yonghao Zhang, Hao Ye, Zheng He, Peng Wu, Runzhou Chen, Minghui Li, Zhan Qin, Wangchen Wang, Lei Sun, Yifu Niu, Lin Yuan, Xiang Yang, Xiaofeng He, Yingping Xiao, Jie Tao, Yangyu Zhu, Jianchen Xue, Jinbao Liu, Kai Zhao, Chongqing Wu, Xinming Liu, Tian Chen, Peng Wang, Di Liu, Yuhong Linus Jiang, Jie Wang, Tengfei Guo, Chunchao Computer Vision and Pattern Recognition Creating immersive and playable 3D worlds from texts or images remains a fundamental challenge in computer vision and graphics. Existing world generation approaches typically fall into two categories: video-based methods that offer rich diversity but lack 3D consistency and rendering efficiency, and 3D-based methods that provide geometric consistency but struggle with limited training data and memory-inefficient representations. To address these limitations, we present HunyuanWorld 1.0, a novel framework that combines the best of both worlds for generating immersive, explorable, and interactive 3D scenes from text and image conditions. Our approach features three key advantages: 1) 360° immersive experiences via panoramic world proxies; 2) mesh export capabilities for seamless compatibility with existing computer graphics pipelines; 3) disentangled object representations for augmented interactivity. The core of our framework is a semantically layered 3D mesh representation that leverages panoramic images as 360° world proxies for semantic-aware world decomposition and reconstruction, enabling the generation of diverse 3D worlds. Extensive experiments demonstrate that our method achieves state-of-the-art performance in generating coherent, explorable, and interactive 3D worlds while enabling versatile applications in virtual reality, physical simulation, game development, and interactive content creation. |
| title | HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2507.21809 |