Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Hu, Yihan, Peng, Jianing, Lin, Yiheng, Liu, Ting, Qu, Xiaochao, Liu, Luoqi, Zhao, Yao, Wei, Yunchao
Formato:	Preprint
Publicado:	2025
Materias:	Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2503.16795
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866908276766015488
author	Hu, Yihan Peng, Jianing Lin, Yiheng Liu, Ting Qu, Xiaochao Liu, Luoqi Zhao, Yao Wei, Yunchao
author_facet	Hu, Yihan Peng, Jianing Lin, Yiheng Liu, Ting Qu, Xiaochao Liu, Luoqi Zhao, Yao Wei, Yunchao
contents	This paper presents a novel approach to improving text-guided image editing using diffusion-based models. Text-guided image editing task poses key challenge of precisly locate and edit the target semantic, and previous methods fall shorts in this aspect. Our method introduces a Precise Semantic Localization strategy that leverages visual and textual self-attention to enhance the cross-attention map, which can serve as a regional cues to improve editing performance. Then we propose a Dual-Level Control mechanism for incorporating regional cues at both feature and latent levels, offering fine-grained control for more precise edits. To fully compare our methods with other DiT-based approaches, we construct the RW-800 benchmark, featuring high resolution images, long descriptive texts, real-world images, and a new text editing task. Experimental results on the popular PIE-Bench and RW-800 benchmarks demonstrate the superior performance of our approach in preserving background and providing accurate edits.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_16795
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics Hu, Yihan Peng, Jianing Lin, Yiheng Liu, Ting Qu, Xiaochao Liu, Luoqi Zhao, Yao Wei, Yunchao Computer Vision and Pattern Recognition This paper presents a novel approach to improving text-guided image editing using diffusion-based models. Text-guided image editing task poses key challenge of precisly locate and edit the target semantic, and previous methods fall shorts in this aspect. Our method introduces a Precise Semantic Localization strategy that leverages visual and textual self-attention to enhance the cross-attention map, which can serve as a regional cues to improve editing performance. Then we propose a Dual-Level Control mechanism for incorporating regional cues at both feature and latent levels, offering fine-grained control for more precise edits. To fully compare our methods with other DiT-based approaches, we construct the RW-800 benchmark, featuring high resolution images, long descriptive texts, real-world images, and a new text editing task. Experimental results on the popular PIE-Bench and RW-800 benchmarks demonstrate the superior performance of our approach in preserving background and providing accurate edits.
title	DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2503.16795

Ejemplares similares