Saved in:
Bibliographic Details
Main Authors: Jin, Jian, Shen, Yang, Fu, Zhenyong, Yang, Jian
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2412.04831
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912147087294464
author Jin, Jian
Shen, Yang
Fu, Zhenyong
Yang, Jian
author_facet Jin, Jian
Shen, Yang
Fu, Zhenyong
Yang, Jian
contents Customized generation aims to incorporate a novel concept into a pre-trained text-to-image model, enabling new generations of the concept in novel contexts guided by textual prompts. However, customized generation suffers from an inherent trade-off between concept fidelity and editability, i.e., between precisely modeling the concept and faithfully adhering to the prompts. Previous methods reluctantly seek a compromise and struggle to achieve both high concept fidelity and ideal prompt alignment simultaneously. In this paper, we propose a Divide, Conquer, then Integrate (DCI) framework, which performs a surgical adjustment in the early stage of denoising to liberate the fine-tuned model from the fidelity-editability trade-off at inference. The two conflicting components in the trade-off are decoupled and individually conquered by two collaborative branches, which are then selectively integrated to preserve high concept fidelity while achieving faithful prompt adherence. To obtain a better fine-tuned model, we introduce an Image-specific Context Optimization} (ICO) strategy for model customization. ICO replaces manual prompt templates with learnable image-specific contexts, providing an adaptive and precise fine-tuning direction to promote the overall performance. Extensive experiments demonstrate the effectiveness of our method in reconciling the fidelity-editability trade-off.
format Preprint
id arxiv_https___arxiv_org_abs_2412_04831
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Customized Generation Reimagined: Fidelity and Editability Harmonized
Jin, Jian
Shen, Yang
Fu, Zhenyong
Yang, Jian
Computer Vision and Pattern Recognition
Customized generation aims to incorporate a novel concept into a pre-trained text-to-image model, enabling new generations of the concept in novel contexts guided by textual prompts. However, customized generation suffers from an inherent trade-off between concept fidelity and editability, i.e., between precisely modeling the concept and faithfully adhering to the prompts. Previous methods reluctantly seek a compromise and struggle to achieve both high concept fidelity and ideal prompt alignment simultaneously. In this paper, we propose a Divide, Conquer, then Integrate (DCI) framework, which performs a surgical adjustment in the early stage of denoising to liberate the fine-tuned model from the fidelity-editability trade-off at inference. The two conflicting components in the trade-off are decoupled and individually conquered by two collaborative branches, which are then selectively integrated to preserve high concept fidelity while achieving faithful prompt adherence. To obtain a better fine-tuned model, we introduce an Image-specific Context Optimization} (ICO) strategy for model customization. ICO replaces manual prompt templates with learnable image-specific contexts, providing an adaptive and precise fine-tuning direction to promote the overall performance. Extensive experiments demonstrate the effectiveness of our method in reconciling the fidelity-editability trade-off.
title Customized Generation Reimagined: Fidelity and Editability Harmonized
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2412.04831