Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ma, Xiaohe, Deschaintre, Valentin, Hašan, Miloš, Luan, Fujun, Zhou, Kun, Wu, Hongzhi, Hu, Yiwei
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2412.03225
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912501711503360
author	Ma, Xiaohe Deschaintre, Valentin Hašan, Miloš Luan, Fujun Zhou, Kun Wu, Hongzhi Hu, Yiwei
author_facet	Ma, Xiaohe Deschaintre, Valentin Hašan, Miloš Luan, Fujun Zhou, Kun Wu, Hongzhi Hu, Yiwei
contents	High-quality material generation is key for virtual environment authoring and inverse rendering. We propose MaterialPicker, a multi-modal material generator leveraging a Diffusion Transformer (DiT) architecture, improving and simplifying the creation of high-quality materials from text prompts and/or photographs. Our method can generate a material based on an image crop of a material sample, even if the captured surface is distorted, viewed at an angle or partially occluded, as is often the case in photographs of natural scenes. We further allow the user to specify a text prompt to provide additional guidance for the generation. We finetune a pre-trained DiT-based video generator into a material generator, where each material map is treated as a frame in a video sequence. We evaluate our approach both quantitatively and qualitatively and show that it enables more diverse material generation and better distortion correction than previous work.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_03225
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	MaterialPicker: Multi-Modal DiT-Based Material Generation Ma, Xiaohe Deschaintre, Valentin Hašan, Miloš Luan, Fujun Zhou, Kun Wu, Hongzhi Hu, Yiwei Computer Vision and Pattern Recognition High-quality material generation is key for virtual environment authoring and inverse rendering. We propose MaterialPicker, a multi-modal material generator leveraging a Diffusion Transformer (DiT) architecture, improving and simplifying the creation of high-quality materials from text prompts and/or photographs. Our method can generate a material based on an image crop of a material sample, even if the captured surface is distorted, viewed at an angle or partially occluded, as is often the case in photographs of natural scenes. We further allow the user to specify a text prompt to provide additional guidance for the generation. We finetune a pre-trained DiT-based video generator into a material generator, where each material map is treated as a frame in a video sequence. We evaluate our approach both quantitatively and qualitatively and show that it enables more diverse material generation and better distortion correction than previous work.
title	MaterialPicker: Multi-Modal DiT-Based Material Generation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2412.03225

Similar Items