Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Mercier, Antoine, Nakhli, Ramin, Reddy, Mahesh, Yasarla, Rajeev, Cai, Hong, Porikli, Fatih, Berger, Guillaume
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2401.07727
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911758424211456
author	Mercier, Antoine Nakhli, Ramin Reddy, Mahesh Yasarla, Rajeev Cai, Hong Porikli, Fatih Berger, Guillaume
author_facet	Mercier, Antoine Nakhli, Ramin Reddy, Mahesh Yasarla, Rajeev Cai, Hong Porikli, Fatih Berger, Guillaume
contents	Despite the latest remarkable advances in generative modeling, efficient generation of high-quality 3D assets from textual prompts remains a difficult task. A key challenge lies in data scarcity: the most extensive 3D datasets encompass merely millions of assets, while their 2D counterparts contain billions of text-image pairs. To address this, we propose a novel approach which harnesses the power of large, pretrained 2D diffusion models. More specifically, our approach, HexaGen3D, fine-tunes a pretrained text-to-image model to jointly predict 6 orthographic projections and the corresponding latent triplane. We then decode these latents to generate a textured mesh. HexaGen3D does not require per-sample optimization, and can infer high-quality and diverse objects from textual prompts in 7 seconds, offering significantly better quality-to-latency trade-offs when comparing to existing approaches. Furthermore, HexaGen3D demonstrates strong generalization to new objects or compositions.
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_07727
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation Mercier, Antoine Nakhli, Ramin Reddy, Mahesh Yasarla, Rajeev Cai, Hong Porikli, Fatih Berger, Guillaume Computer Vision and Pattern Recognition Despite the latest remarkable advances in generative modeling, efficient generation of high-quality 3D assets from textual prompts remains a difficult task. A key challenge lies in data scarcity: the most extensive 3D datasets encompass merely millions of assets, while their 2D counterparts contain billions of text-image pairs. To address this, we propose a novel approach which harnesses the power of large, pretrained 2D diffusion models. More specifically, our approach, HexaGen3D, fine-tunes a pretrained text-to-image model to jointly predict 6 orthographic projections and the corresponding latent triplane. We then decode these latents to generate a textured mesh. HexaGen3D does not require per-sample optimization, and can infer high-quality and diverse objects from textual prompts in 7 seconds, offering significantly better quality-to-latency trade-offs when comparing to existing approaches. Furthermore, HexaGen3D demonstrates strong generalization to new objects or compositions.
title	HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2401.07727

Similar Items