Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Qi, Tianhao, Fang, Shancheng, Wu, Yanze, Xie, Hongtao, Liu, Jiawei, Chen, Lang, He, Qian, Zhang, Yongdong
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2403.06951
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914710732931072
author	Qi, Tianhao Fang, Shancheng Wu, Yanze Xie, Hongtao Liu, Jiawei Chen, Lang He, Qian Zhang, Yongdong
author_facet	Qi, Tianhao Fang, Shancheng Wu, Yanze Xie, Hongtao Liu, Jiawei Chen, Lang He, Qian Zhang, Yongdong
contents	The diffusion-based text-to-image model harbors immense potential in transferring reference style. However, current encoder-based approaches significantly impair the text controllability of text-to-image models while transferring styles. In this paper, we introduce DEADiff to address this issue using the following two strategies: 1) a mechanism to decouple the style and semantics of reference images. The decoupled feature representations are first extracted by Q-Formers which are instructed by different text descriptions. Then they are injected into mutually exclusive subsets of cross-attention layers for better disentanglement. 2) A non-reconstructive learning method. The Q-Formers are trained using paired images rather than the identical target, in which the reference image and the ground-truth image are with the same style or semantics. We show that DEADiff attains the best visual stylization results and optimal balance between the text controllability inherent in the text-to-image model and style similarity to the reference image, as demonstrated both quantitatively and qualitatively. Our project page is https://tianhao-qi.github.io/DEADiff/.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_06951
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations Qi, Tianhao Fang, Shancheng Wu, Yanze Xie, Hongtao Liu, Jiawei Chen, Lang He, Qian Zhang, Yongdong Computer Vision and Pattern Recognition The diffusion-based text-to-image model harbors immense potential in transferring reference style. However, current encoder-based approaches significantly impair the text controllability of text-to-image models while transferring styles. In this paper, we introduce DEADiff to address this issue using the following two strategies: 1) a mechanism to decouple the style and semantics of reference images. The decoupled feature representations are first extracted by Q-Formers which are instructed by different text descriptions. Then they are injected into mutually exclusive subsets of cross-attention layers for better disentanglement. 2) A non-reconstructive learning method. The Q-Formers are trained using paired images rather than the identical target, in which the reference image and the ground-truth image are with the same style or semantics. We show that DEADiff attains the best visual stylization results and optimal balance between the text controllability inherent in the text-to-image model and style similarity to the reference image, as demonstrated both quantitatively and qualitatively. Our project page is https://tianhao-qi.github.io/DEADiff/.
title	DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2403.06951

Similar Items