Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Morita, Ryugo, Frolov, Stanislav, Moser, Brian Bernhard, Shirakawa, Takahiro, Watanabe, Ko, Dengel, Andreas, Zhou, Jinjia
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2411.15580
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912515398565888
author	Morita, Ryugo Frolov, Stanislav Moser, Brian Bernhard Shirakawa, Takahiro Watanabe, Ko Dengel, Andreas Zhou, Jinjia
author_facet	Morita, Ryugo Frolov, Stanislav Moser, Brian Bernhard Shirakawa, Takahiro Watanabe, Ko Dengel, Andreas Zhou, Jinjia
contents	Diffusion models have enabled the generation of high-quality images with a strong focus on realism and textual fidelity. Yet, large-scale text-to-image models, such as Stable Diffusion, struggle to generate images where foreground objects are placed over a chroma key background, limiting their ability to separate foreground and background elements without fine-tuning. To address this limitation, we present a novel Training-Free Chroma Key Content Generation Diffusion Model (TKG-DM), which optimizes the initial random noise to produce images with foreground objects on a specifiable color background. Our proposed method is the first to explore the manipulation of the color aspects in initial noise for controlled background generation, enabling precise separation of foreground and background without fine-tuning. Extensive experiments demonstrate that our training-free method outperforms existing methods in both qualitative and quantitative evaluations, matching or surpassing fine-tuned models. Finally, we successfully extend it to other tasks (e.g., consistency models and text-to-video), highlighting its transformative potential across various generative applications where independent control of foreground and background is crucial.
format	Preprint
id	arxiv_https___arxiv_org_abs_2411_15580
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	TKG-DM: Training-free Chroma Key Content Generation Diffusion Model Morita, Ryugo Frolov, Stanislav Moser, Brian Bernhard Shirakawa, Takahiro Watanabe, Ko Dengel, Andreas Zhou, Jinjia Computer Vision and Pattern Recognition Diffusion models have enabled the generation of high-quality images with a strong focus on realism and textual fidelity. Yet, large-scale text-to-image models, such as Stable Diffusion, struggle to generate images where foreground objects are placed over a chroma key background, limiting their ability to separate foreground and background elements without fine-tuning. To address this limitation, we present a novel Training-Free Chroma Key Content Generation Diffusion Model (TKG-DM), which optimizes the initial random noise to produce images with foreground objects on a specifiable color background. Our proposed method is the first to explore the manipulation of the color aspects in initial noise for controlled background generation, enabling precise separation of foreground and background without fine-tuning. Extensive experiments demonstrate that our training-free method outperforms existing methods in both qualitative and quantitative evaluations, matching or surpassing fine-tuned models. Finally, we successfully extend it to other tasks (e.g., consistency models and text-to-video), highlighting its transformative potential across various generative applications where independent control of foreground and background is crucial.
title	TKG-DM: Training-free Chroma Key Content Generation Diffusion Model
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2411.15580

Similar Items