Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ren, Jiawei, Xu, Mengmeng, Wu, Jui-Chieh, Liu, Ziwei, Xiang, Tao, Toisoul, Antoine
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2404.07178
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910405465473024
author	Ren, Jiawei Xu, Mengmeng Wu, Jui-Chieh Liu, Ziwei Xiang, Tao Toisoul, Antoine
author_facet	Ren, Jiawei Xu, Mengmeng Wu, Jui-Chieh Liu, Ziwei Xiang, Tao Toisoul, Antoine
contents	Diffusion models generate images with an unprecedented level of quality, but how can we freely rearrange image layouts? Recent works generate controllable scenes via learning spatially disentangled latent codes, but these methods do not apply to diffusion models due to their fixed forward process. In this work, we propose SceneDiffusion to optimize a layered scene representation during the diffusion sampling process. Our key insight is that spatial disentanglement can be obtained by jointly denoising scene renderings at different spatial layouts. Our generated scenes support a wide range of spatial editing operations, including moving, resizing, cloning, and layer-wise appearance editing operations, including object restyling and replacing. Moreover, a scene can be generated conditioned on a reference image, thus enabling object moving for in-the-wild images. Notably, this approach is training-free, compatible with general text-to-image diffusion models, and responsive in less than a second.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_07178
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Move Anything with Layered Scene Diffusion Ren, Jiawei Xu, Mengmeng Wu, Jui-Chieh Liu, Ziwei Xiang, Tao Toisoul, Antoine Computer Vision and Pattern Recognition Diffusion models generate images with an unprecedented level of quality, but how can we freely rearrange image layouts? Recent works generate controllable scenes via learning spatially disentangled latent codes, but these methods do not apply to diffusion models due to their fixed forward process. In this work, we propose SceneDiffusion to optimize a layered scene representation during the diffusion sampling process. Our key insight is that spatial disentanglement can be obtained by jointly denoising scene renderings at different spatial layouts. Our generated scenes support a wide range of spatial editing operations, including moving, resizing, cloning, and layer-wise appearance editing operations, including object restyling and replacing. Moreover, a scene can be generated conditioned on a reference image, thus enabling object moving for in-the-wild images. Notably, this approach is training-free, compatible with general text-to-image diffusion models, and responsive in less than a second.
title	Move Anything with Layered Scene Diffusion
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2404.07178

Similar Items