Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Pang, Yatian, Jia, Tanghui, Shi, Yujun, Tang, Zhenyu, Zhang, Junwu, Cheng, Xinhua, Zhou, Xing, Tay, Francis E. H., Yuan, Li
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2403.08902
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911797591670784
author	Pang, Yatian Jia, Tanghui Shi, Yujun Tang, Zhenyu Zhang, Junwu Cheng, Xinhua Zhou, Xing Tay, Francis E. H. Yuan, Li
author_facet	Pang, Yatian Jia, Tanghui Shi, Yujun Tang, Zhenyu Zhang, Junwu Cheng, Xinhua Zhou, Xing Tay, Francis E. H. Yuan, Li
contents	We present Envision3D, a novel method for efficiently generating high-quality 3D content from a single image. Recent methods that extract 3D content from multi-view images generated by diffusion models show great potential. However, it is still challenging for diffusion models to generate dense multi-view consistent images, which is crucial for the quality of 3D content extraction. To address this issue, we propose a novel cascade diffusion framework, which decomposes the challenging dense views generation task into two tractable stages, namely anchor views generation and anchor views interpolation. In the first stage, we train the image diffusion model to generate global consistent anchor views conditioning on image-normal pairs. Subsequently, leveraging our video diffusion model fine-tuned on consecutive multi-view images, we conduct interpolation on the previous anchor views to generate extra dense views. This framework yields dense, multi-view consistent images, providing comprehensive 3D information. To further enhance the overall generation quality, we introduce a coarse-to-fine sampling strategy for the reconstruction algorithm to robustly extract textured meshes from the generated dense images. Extensive experiments demonstrate that our method is capable of generating high-quality 3D content in terms of texture and geometry, surpassing previous image-to-3D baseline methods.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_08902
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Envision3D: One Image to 3D with Anchor Views Interpolation Pang, Yatian Jia, Tanghui Shi, Yujun Tang, Zhenyu Zhang, Junwu Cheng, Xinhua Zhou, Xing Tay, Francis E. H. Yuan, Li Computer Vision and Pattern Recognition We present Envision3D, a novel method for efficiently generating high-quality 3D content from a single image. Recent methods that extract 3D content from multi-view images generated by diffusion models show great potential. However, it is still challenging for diffusion models to generate dense multi-view consistent images, which is crucial for the quality of 3D content extraction. To address this issue, we propose a novel cascade diffusion framework, which decomposes the challenging dense views generation task into two tractable stages, namely anchor views generation and anchor views interpolation. In the first stage, we train the image diffusion model to generate global consistent anchor views conditioning on image-normal pairs. Subsequently, leveraging our video diffusion model fine-tuned on consecutive multi-view images, we conduct interpolation on the previous anchor views to generate extra dense views. This framework yields dense, multi-view consistent images, providing comprehensive 3D information. To further enhance the overall generation quality, we introduce a coarse-to-fine sampling strategy for the reconstruction algorithm to robustly extract textured meshes from the generated dense images. Extensive experiments demonstrate that our method is capable of generating high-quality 3D content in terms of texture and geometry, surpassing previous image-to-3D baseline methods.
title	Envision3D: One Image to 3D with Anchor Views Interpolation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2403.08902

Similar Items