Saved in:
Bibliographic Details
Main Authors: Zhou, Zijun, Deng, Yingying, He, Xiangyu, Dong, Weiming, Tang, Fan
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2505.04320
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913825358348288
author Zhou, Zijun
Deng, Yingying
He, Xiangyu
Dong, Weiming
Tang, Fan
author_facet Zhou, Zijun
Deng, Yingying
He, Xiangyu
Dong, Weiming
Tang, Fan
contents Many real-world applications, such as interactive photo retouching, artistic content creation, and product design, require flexible and iterative image editing. However, existing image editing methods primarily focus on achieving the desired modifications in a single step, which often struggles with ambiguous user intent, complex transformations, or the need for progressive refinements. As a result, these methods frequently produce inconsistent outcomes or fail to meet user expectations. To address these challenges, we propose a multi-turn image editing framework that enables users to iteratively refine their edits, progressively achieving more satisfactory results. Our approach leverages flow matching for accurate image inversion and a dual-objective Linear Quadratic Regulators (LQR) for stable sampling, effectively mitigating error accumulation. Additionally, by analyzing the layer-wise roles of transformers, we introduce a adaptive attention highlighting method that enhances editability while preserving multi-turn coherence. Extensive experiments demonstrate that our framework significantly improves edit success rates and visual fidelity compared to existing methods.
format Preprint
id arxiv_https___arxiv_org_abs_2505_04320
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Multi-turn Consistent Image Editing
Zhou, Zijun
Deng, Yingying
He, Xiangyu
Dong, Weiming
Tang, Fan
Computer Vision and Pattern Recognition
Many real-world applications, such as interactive photo retouching, artistic content creation, and product design, require flexible and iterative image editing. However, existing image editing methods primarily focus on achieving the desired modifications in a single step, which often struggles with ambiguous user intent, complex transformations, or the need for progressive refinements. As a result, these methods frequently produce inconsistent outcomes or fail to meet user expectations. To address these challenges, we propose a multi-turn image editing framework that enables users to iteratively refine their edits, progressively achieving more satisfactory results. Our approach leverages flow matching for accurate image inversion and a dual-objective Linear Quadratic Regulators (LQR) for stable sampling, effectively mitigating error accumulation. Additionally, by analyzing the layer-wise roles of transformers, we introduce a adaptive attention highlighting method that enhances editability while preserving multi-turn coherence. Extensive experiments demonstrate that our framework significantly improves edit success rates and visual fidelity compared to existing methods.
title Multi-turn Consistent Image Editing
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2505.04320