Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhang, Yang, Zhang, Rui, Nie, Xuecheng, Li, Haochen, Chen, Jikun, Hao, Yifan, Zhang, Xin, Liu, Luoqi, Li, Ling
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2409.01327
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866912254197235712
author	Zhang, Yang Zhang, Rui Nie, Xuecheng Li, Haochen Chen, Jikun Hao, Yifan Zhang, Xin Liu, Luoqi Li, Ling
author_facet	Zhang, Yang Zhang, Rui Nie, Xuecheng Li, Haochen Chen, Jikun Hao, Yifan Zhang, Xin Liu, Luoqi Li, Ling
contents	Recent text-to-image models have achieved impressive results in generating high-quality images. However, when tasked with multi-concept generation creating images that contain multiple characters or objects, existing methods often suffer from semantic entanglement, including concept entanglement and improper attribute binding, leading to significant text-image inconsistency. We identify that semantic entanglement arises when certain regions of the latent features attend to incorrect concept and attribute tokens. In this work, we propose the Semantic Protection Diffusion Model (SPDiffusion) to address both concept entanglement and improper attribute binding using only a text prompt as input. The SPDiffusion framework introduces a novel concept region extraction method SP-Extraction to resolve region entanglement in cross-attention, along with SP-Attn, which protects concept regions from the influence of irrelevant attributes and concepts. To evaluate our method, we test it on existing benchmarks, where SPDiffusion achieves state-of-the-art results, demonstrating its effectiveness.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_01327
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	SPDiffusion: Semantic Protection Diffusion Models for Multi-concept Text-to-image Generation Zhang, Yang Zhang, Rui Nie, Xuecheng Li, Haochen Chen, Jikun Hao, Yifan Zhang, Xin Liu, Luoqi Li, Ling Computer Vision and Pattern Recognition Recent text-to-image models have achieved impressive results in generating high-quality images. However, when tasked with multi-concept generation creating images that contain multiple characters or objects, existing methods often suffer from semantic entanglement, including concept entanglement and improper attribute binding, leading to significant text-image inconsistency. We identify that semantic entanglement arises when certain regions of the latent features attend to incorrect concept and attribute tokens. In this work, we propose the Semantic Protection Diffusion Model (SPDiffusion) to address both concept entanglement and improper attribute binding using only a text prompt as input. The SPDiffusion framework introduces a novel concept region extraction method SP-Extraction to resolve region entanglement in cross-attention, along with SP-Attn, which protects concept regions from the influence of irrelevant attributes and concepts. To evaluate our method, we test it on existing benchmarks, where SPDiffusion achieves state-of-the-art results, demonstrating its effectiveness.
title	SPDiffusion: Semantic Protection Diffusion Models for Multi-concept Text-to-image Generation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2409.01327

Ähnliche Einträge