Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zhang, Yang, Zhang, Rui, Nie, Xuecheng, Li, Haochen, Chen, Jikun, Hao, Yifan, Zhang, Xin, Liu, Luoqi, Li, Ling
Format: Preprint
Veröffentlicht: 2024
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2409.01327
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866912254197235712
author Zhang, Yang
Zhang, Rui
Nie, Xuecheng
Li, Haochen
Chen, Jikun
Hao, Yifan
Zhang, Xin
Liu, Luoqi
Li, Ling
author_facet Zhang, Yang
Zhang, Rui
Nie, Xuecheng
Li, Haochen
Chen, Jikun
Hao, Yifan
Zhang, Xin
Liu, Luoqi
Li, Ling
contents Recent text-to-image models have achieved impressive results in generating high-quality images. However, when tasked with multi-concept generation creating images that contain multiple characters or objects, existing methods often suffer from semantic entanglement, including concept entanglement and improper attribute binding, leading to significant text-image inconsistency. We identify that semantic entanglement arises when certain regions of the latent features attend to incorrect concept and attribute tokens. In this work, we propose the Semantic Protection Diffusion Model (SPDiffusion) to address both concept entanglement and improper attribute binding using only a text prompt as input. The SPDiffusion framework introduces a novel concept region extraction method SP-Extraction to resolve region entanglement in cross-attention, along with SP-Attn, which protects concept regions from the influence of irrelevant attributes and concepts. To evaluate our method, we test it on existing benchmarks, where SPDiffusion achieves state-of-the-art results, demonstrating its effectiveness.
format Preprint
id arxiv_https___arxiv_org_abs_2409_01327
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle SPDiffusion: Semantic Protection Diffusion Models for Multi-concept Text-to-image Generation
Zhang, Yang
Zhang, Rui
Nie, Xuecheng
Li, Haochen
Chen, Jikun
Hao, Yifan
Zhang, Xin
Liu, Luoqi
Li, Ling
Computer Vision and Pattern Recognition
Recent text-to-image models have achieved impressive results in generating high-quality images. However, when tasked with multi-concept generation creating images that contain multiple characters or objects, existing methods often suffer from semantic entanglement, including concept entanglement and improper attribute binding, leading to significant text-image inconsistency. We identify that semantic entanglement arises when certain regions of the latent features attend to incorrect concept and attribute tokens. In this work, we propose the Semantic Protection Diffusion Model (SPDiffusion) to address both concept entanglement and improper attribute binding using only a text prompt as input. The SPDiffusion framework introduces a novel concept region extraction method SP-Extraction to resolve region entanglement in cross-attention, along with SP-Attn, which protects concept regions from the influence of irrelevant attributes and concepts. To evaluate our method, we test it on existing benchmarks, where SPDiffusion achieves state-of-the-art results, demonstrating its effectiveness.
title SPDiffusion: Semantic Protection Diffusion Models for Multi-concept Text-to-image Generation
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2409.01327