Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Santos, Rodrigo, Silva, João, Branco, António
Format:	Preprint
Publié:	2024
Sujets:	Computation and Language Artificial Intelligence Computer Vision and Pattern Recognition
Accès en ligne:	https://arxiv.org/abs/2403.08004
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866917856194592768
author	Santos, Rodrigo Silva, João Branco, António
author_facet	Santos, Rodrigo Silva, João Branco, António
contents	The combination of language processing and image processing keeps attracting increased interest given recent impressive advances that leverage the combined strengths of both domains of research. Among these advances, the task of editing an image on the basis solely of a natural language instruction stands out as a most challenging endeavour. While recent approaches for this task resort, in one way or other, to some form of preliminary preparation, training or fine-tuning, this paper explores a novel approach: We propose a preparation-free method that permits instruction-guided image editing on the fly. This approach is organized along three steps properly orchestrated that resort to image captioning and DDIM inversion, followed by obtaining the edit direction embedding, followed by image editing proper. While dispensing with preliminary preparation, our approach demonstrates to be effective and competitive, outperforming recent, state of the art models for this task when evaluated on the MAGICBRUSH dataset.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_08004
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Leveraging LLMs for On-the-Fly Instruction Guided Image Editing Santos, Rodrigo Silva, João Branco, António Computation and Language Artificial Intelligence Computer Vision and Pattern Recognition The combination of language processing and image processing keeps attracting increased interest given recent impressive advances that leverage the combined strengths of both domains of research. Among these advances, the task of editing an image on the basis solely of a natural language instruction stands out as a most challenging endeavour. While recent approaches for this task resort, in one way or other, to some form of preliminary preparation, training or fine-tuning, this paper explores a novel approach: We propose a preparation-free method that permits instruction-guided image editing on the fly. This approach is organized along three steps properly orchestrated that resort to image captioning and DDIM inversion, followed by obtaining the edit direction embedding, followed by image editing proper. While dispensing with preliminary preparation, our approach demonstrates to be effective and competitive, outperforming recent, state of the art models for this task when evaluated on the MAGICBRUSH dataset.
title	Leveraging LLMs for On-the-Fly Instruction Guided Image Editing
topic	Computation and Language Artificial Intelligence Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2403.08004

Documents similaires