Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.13490 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866929250536259584 |
|---|---|
| author | Wu, Chen De la Torre, Fernando |
| author_facet | Wu, Chen De la Torre, Fernando |
| contents | Text-to-image diffusion models have achieved remarkable performance in image synthesis, while the text interface does not always provide fine-grained control over certain image factors. For instance, changing a single token in the text can have unintended effects on the image. This paper shows a simple modification of classifier-free guidance can help disentangle image factors in text-to-image models. The key idea of our method, Contrastive Guidance, is to characterize an intended factor with two prompts that differ in minimal tokens: the positive prompt describes the image to be synthesized, and the baseline prompt serves as a "baseline" that disentangles other factors. Contrastive Guidance is a general method we illustrate whose benefits in three scenarios: (1) to guide domain-specific diffusion models trained on an object class, (2) to gain continuous, rig-like controls for text-to-image generation, and (3) to improve the performance of zero-shot image editors. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2402_13490 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion Models Wu, Chen De la Torre, Fernando Computer Vision and Pattern Recognition Text-to-image diffusion models have achieved remarkable performance in image synthesis, while the text interface does not always provide fine-grained control over certain image factors. For instance, changing a single token in the text can have unintended effects on the image. This paper shows a simple modification of classifier-free guidance can help disentangle image factors in text-to-image models. The key idea of our method, Contrastive Guidance, is to characterize an intended factor with two prompts that differ in minimal tokens: the positive prompt describes the image to be synthesized, and the baseline prompt serves as a "baseline" that disentangles other factors. Contrastive Guidance is a general method we illustrate whose benefits in three scenarios: (1) to guide domain-specific diffusion models trained on an object class, (2) to gain continuous, rig-like controls for text-to-image generation, and (3) to improve the performance of zero-shot image editors. |
| title | Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion Models |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2402.13490 |