MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Yu, Yonghao, Zhu, Shunan, Qin, Huai, Li, Haorui
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computer Vision and Pattern Recognition
Accesso online:	https://arxiv.org/abs/2401.16764
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866914526598791168
author	Yu, Yonghao Zhu, Shunan Qin, Huai Li, Haorui
author_facet	Yu, Yonghao Zhu, Shunan Qin, Huai Li, Haorui
contents	Witnessing the evolution of text-to-image diffusion models, significant strides have been made in text-to-3D generation. Currently, two primary paradigms dominate the field of text-to-3D: the feed-forward generation solutions, capable of swiftly producing 3D assets but often yielding coarse results, and the Score Distillation Sampling (SDS) based solutions, known for generating high-fidelity 3D assets albeit at a slower pace. The synergistic integration of these methods holds substantial promise for advancing 3D generation techniques. In this paper, we present BoostDream, a highly efficient plug-and-play 3D refining method designed to transform coarse 3D assets into high-quality. The BoostDream framework comprises three distinct processes: (1) We introduce 3D model distillation that fits differentiable representations from the 3D assets obtained through feed-forward generation. (2) A novel multi-view SDS loss is designed, which utilizes a multi-view aware 2D diffusion model to refine the 3D assets. (3) We propose to use prompt and multi-view consistent normal maps as guidance in refinement. Our extensive experiment is conducted on different differentiable 3D representations, revealing that BoostDream excels in generating high-quality 3D assets rapidly, overcoming the Janus problem compared to conventional SDS-based methods. This breakthrough signifies a substantial advancement in both the efficiency and quality of 3D generation processes.
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_16764
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion Yu, Yonghao Zhu, Shunan Qin, Huai Li, Haorui Computer Vision and Pattern Recognition Witnessing the evolution of text-to-image diffusion models, significant strides have been made in text-to-3D generation. Currently, two primary paradigms dominate the field of text-to-3D: the feed-forward generation solutions, capable of swiftly producing 3D assets but often yielding coarse results, and the Score Distillation Sampling (SDS) based solutions, known for generating high-fidelity 3D assets albeit at a slower pace. The synergistic integration of these methods holds substantial promise for advancing 3D generation techniques. In this paper, we present BoostDream, a highly efficient plug-and-play 3D refining method designed to transform coarse 3D assets into high-quality. The BoostDream framework comprises three distinct processes: (1) We introduce 3D model distillation that fits differentiable representations from the 3D assets obtained through feed-forward generation. (2) A novel multi-view SDS loss is designed, which utilizes a multi-view aware 2D diffusion model to refine the 3D assets. (3) We propose to use prompt and multi-view consistent normal maps as guidance in refinement. Our extensive experiment is conducted on different differentiable 3D representations, revealing that BoostDream excels in generating high-quality 3D assets rapidly, overcoming the Janus problem compared to conventional SDS-based methods. This breakthrough signifies a substantial advancement in both the efficiency and quality of 3D generation processes.
title	BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2401.16764

Documenti analoghi