Saved in:
Bibliographic Details
Main Authors: Wang, Xi, Peng, Yichen, Fang, Heng, Wang, Yilin, Xie, Haoran, Yang, Xi, Li, Chuntao
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2404.13263
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910733947633664
author Wang, Xi
Peng, Yichen
Fang, Heng
Wang, Yilin
Xie, Haoran
Yang, Xi
Li, Chuntao
author_facet Wang, Xi
Peng, Yichen
Fang, Heng
Wang, Yilin
Xie, Haoran
Yang, Xi
Li, Chuntao
contents In controllable generation tasks, flexibly manipulating the generated images to attain a desired appearance or structure based on a single input image cue remains a critical and longstanding challenge. Achieving this requires the effective decoupling of key attributes within the input image data to achieve representations accurately. Previous works have concentrated predominantly on disentangling image attributes within feature space. However, the complex distribution present in real-world data often makes the application of such decoupling algorithms to other datasets challenging. Moreover, the granularity of control over feature encoding frequently fails to meet specific task requirements. Upon scrutinizing the characteristics of various generative models, we have observed that the input sensitivity and dynamic evolution properties of the diffusion model can be effectively fused with the explicit decomposition operation in pixel space. This allows the operation that we design and use in pixel space to achieve the desired control effect on the specific representation in the generated results. Therefore, we propose FilterPrompt, an approach to enhance the effect of controllable generation. It can be universally applied to any diffusion model, allowing users to adjust the representation of specific image features in accordance with task requirements, thereby facilitating more precise and controllable generation outcomes. In particular, our designed experiments demonstrate that the FilterPrompt optimizes feature correlation, mitigates content conflicts during the generation process, and enhances the effect of controllable generation.
format Preprint
id arxiv_https___arxiv_org_abs_2404_13263
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle FilterPrompt: A Simple yet Efficient Approach to Guide Image Appearance Transfer in Diffusion Models
Wang, Xi
Peng, Yichen
Fang, Heng
Wang, Yilin
Xie, Haoran
Yang, Xi
Li, Chuntao
Computer Vision and Pattern Recognition
In controllable generation tasks, flexibly manipulating the generated images to attain a desired appearance or structure based on a single input image cue remains a critical and longstanding challenge. Achieving this requires the effective decoupling of key attributes within the input image data to achieve representations accurately. Previous works have concentrated predominantly on disentangling image attributes within feature space. However, the complex distribution present in real-world data often makes the application of such decoupling algorithms to other datasets challenging. Moreover, the granularity of control over feature encoding frequently fails to meet specific task requirements. Upon scrutinizing the characteristics of various generative models, we have observed that the input sensitivity and dynamic evolution properties of the diffusion model can be effectively fused with the explicit decomposition operation in pixel space. This allows the operation that we design and use in pixel space to achieve the desired control effect on the specific representation in the generated results. Therefore, we propose FilterPrompt, an approach to enhance the effect of controllable generation. It can be universally applied to any diffusion model, allowing users to adjust the representation of specific image features in accordance with task requirements, thereby facilitating more precise and controllable generation outcomes. In particular, our designed experiments demonstrate that the FilterPrompt optimizes feature correlation, mitigates content conflicts during the generation process, and enhances the effect of controllable generation.
title FilterPrompt: A Simple yet Efficient Approach to Guide Image Appearance Transfer in Diffusion Models
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2404.13263