Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Chen, Yongwei, Wang, Tengfei, Wu, Tong, Pan, Xingang, Jia, Kui, Liu, Ziwei
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2403.12409
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866929282087911424
author	Chen, Yongwei Wang, Tengfei Wu, Tong Pan, Xingang Jia, Kui Liu, Ziwei
author_facet	Chen, Yongwei Wang, Tengfei Wu, Tong Pan, Xingang Jia, Kui Liu, Ziwei
contents	Generating high-quality 3D assets from a given image is highly desirable in various applications such as AR/VR. Recent advances in single-image 3D generation explore feed-forward models that learn to infer the 3D model of an object without optimization. Though promising results have been achieved in single object generation, these methods often struggle to model complex 3D assets that inherently contain multiple objects. In this work, we present ComboVerse, a 3D generation framework that produces high-quality 3D assets with complex compositions by learning to combine multiple models. 1) We first perform an in-depth analysis of this ``multi-object gap'' from both model and data perspectives. 2) Next, with reconstructed 3D models of different objects, we seek to adjust their sizes, rotation angles, and locations to create a 3D asset that matches the given image. 3) To automate this process, we apply spatially-aware score distillation sampling (SSDS) from pretrained diffusion models to guide the positioning of objects. Our proposed framework emphasizes spatial alignment of objects, compared with standard score distillation sampling, and thus achieves more accurate results. Extensive experiments validate ComboVerse achieves clear improvements over existing methods in generating compositional 3D assets.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_12409
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance Chen, Yongwei Wang, Tengfei Wu, Tong Pan, Xingang Jia, Kui Liu, Ziwei Computer Vision and Pattern Recognition Generating high-quality 3D assets from a given image is highly desirable in various applications such as AR/VR. Recent advances in single-image 3D generation explore feed-forward models that learn to infer the 3D model of an object without optimization. Though promising results have been achieved in single object generation, these methods often struggle to model complex 3D assets that inherently contain multiple objects. In this work, we present ComboVerse, a 3D generation framework that produces high-quality 3D assets with complex compositions by learning to combine multiple models. 1) We first perform an in-depth analysis of this ``multi-object gap'' from both model and data perspectives. 2) Next, with reconstructed 3D models of different objects, we seek to adjust their sizes, rotation angles, and locations to create a 3D asset that matches the given image. 3) To automate this process, we apply spatially-aware score distillation sampling (SSDS) from pretrained diffusion models to guide the positioning of objects. Our proposed framework emphasizes spatial alignment of objects, compared with standard score distillation sampling, and thus achieves more accurate results. Extensive experiments validate ComboVerse achieves clear improvements over existing methods in generating compositional 3D assets.
title	ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2403.12409

Ähnliche Einträge