Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kim, Kangyeol, Seo, Wooseok, Nam, Sehyun, Kim, Bodam, Jeong, Suhyeon, Cho, Wonwoo, Choo, Jaegul, Yu, Youngjae
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2407.09779
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913430257008640
author	Kim, Kangyeol Seo, Wooseok Nam, Sehyun Kim, Bodam Jeong, Suhyeon Cho, Wonwoo Choo, Jaegul Yu, Youngjae
author_facet	Kim, Kangyeol Seo, Wooseok Nam, Sehyun Kim, Bodam Jeong, Suhyeon Cho, Wonwoo Choo, Jaegul Yu, Youngjae
contents	Personalized text-to-image (P-T2I) generation aims to create new, text-guided images featuring the personalized subject with a few reference images. However, balancing the trade-off relationship between prompt fidelity and identity preservation remains a critical challenge. To address the issue, we propose a novel P-T2I method called Layout-and-Retouch, consisting of two stages: 1) layout generation and 2) retouch. In the first stage, our step-blended inference utilizes the inherent sample diversity of vanilla T2I models to produce diversified layout images, while also enhancing prompt fidelity. In the second stage, multi-source attention swapping integrates the context image from the first stage with the reference image, leveraging the structure from the context image and extracting visual features from the reference image. This achieves high prompt fidelity while preserving identity characteristics. Through our extensive experiments, we demonstrate that our method generates a wide variety of images with diverse layouts while maintaining the unique identity features of the personalized objects, even with challenging text prompts. This versatility highlights the potential of our framework to handle complex conditions, significantly enhancing the diversity and applicability of personalized image synthesis.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_09779
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation Kim, Kangyeol Seo, Wooseok Nam, Sehyun Kim, Bodam Jeong, Suhyeon Cho, Wonwoo Choo, Jaegul Yu, Youngjae Computer Vision and Pattern Recognition Artificial Intelligence Personalized text-to-image (P-T2I) generation aims to create new, text-guided images featuring the personalized subject with a few reference images. However, balancing the trade-off relationship between prompt fidelity and identity preservation remains a critical challenge. To address the issue, we propose a novel P-T2I method called Layout-and-Retouch, consisting of two stages: 1) layout generation and 2) retouch. In the first stage, our step-blended inference utilizes the inherent sample diversity of vanilla T2I models to produce diversified layout images, while also enhancing prompt fidelity. In the second stage, multi-source attention swapping integrates the context image from the first stage with the reference image, leveraging the structure from the context image and extracting visual features from the reference image. This achieves high prompt fidelity while preserving identity characteristics. Through our extensive experiments, we demonstrate that our method generates a wide variety of images with diverse layouts while maintaining the unique identity features of the personalized objects, even with challenging text prompts. This versatility highlights the potential of our framework to handle complex conditions, significantly enhancing the diversity and applicability of personalized image synthesis.
title	Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2407.09779

Similar Items