Saved in:
Bibliographic Details
Main Authors: Alharbi, Yazeed, Wonka, Peter
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2403.12585
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909141993259008
author Alharbi, Yazeed
Wonka, Peter
author_facet Alharbi, Yazeed
Wonka, Peter
contents We present a novel, training-free approach for textual editing of real images using diffusion models. Unlike prior methods that rely on computationally expensive finetuning, our approach leverages LAtent SPatial Alignment (LASPA) to efficiently preserve image details. We demonstrate how the diffusion process is amenable to spatial guidance using a reference image, leading to semantically coherent edits. This eliminates the need for complex optimization and costly model finetuning, resulting in significantly faster editing compared to previous methods. Additionally, our method avoids the storage requirements associated with large finetuned models. These advantages make our approach particularly well-suited for editing on mobile devices and applications demanding rapid response times. While simple and fast, our method achieves 62-71\% preference in a user-study and significantly better model-based editing strength and image preservation scores.
format Preprint
id arxiv_https___arxiv_org_abs_2403_12585
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle LASPA: Latent Spatial Alignment for Fast Training-free Single Image Editing
Alharbi, Yazeed
Wonka, Peter
Computer Vision and Pattern Recognition
We present a novel, training-free approach for textual editing of real images using diffusion models. Unlike prior methods that rely on computationally expensive finetuning, our approach leverages LAtent SPatial Alignment (LASPA) to efficiently preserve image details. We demonstrate how the diffusion process is amenable to spatial guidance using a reference image, leading to semantically coherent edits. This eliminates the need for complex optimization and costly model finetuning, resulting in significantly faster editing compared to previous methods. Additionally, our method avoids the storage requirements associated with large finetuned models. These advantages make our approach particularly well-suited for editing on mobile devices and applications demanding rapid response times. While simple and fast, our method achieves 62-71\% preference in a user-study and significantly better model-based editing strength and image preservation scores.
title LASPA: Latent Spatial Alignment for Fast Training-free Single Image Editing
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2403.12585