Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zafarani, Ahmad, Dehghanian, Zahra, Davoodi, Mohammadreza, Shadroo, Mohsen, Fazli, MohammadAmin, Rabiee, Hamid R.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.12287
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908708674469888
author	Zafarani, Ahmad Dehghanian, Zahra Davoodi, Mohammadreza Shadroo, Mohsen Fazli, MohammadAmin Rabiee, Hamid R.
author_facet	Zafarani, Ahmad Dehghanian, Zahra Davoodi, Mohammadreza Shadroo, Mohsen Fazli, MohammadAmin Rabiee, Hamid R.
contents	The evaluation of drag based image editing models is unreliable due to a lack of standardized benchmarks and metrics. This ambiguity stems from inconsistent evaluation protocols and, critically, the absence of datasets containing ground truth target images, making objective comparisons between competing methods difficult. To address this, we introduce \textbf{RealDrag}, the first comprehensive benchmark for point based image editing that includes paired ground truth target images. Our dataset contains over 400 human annotated samples from diverse video sources, providing source/target images, handle/target points, editable region masks, and descriptive captions for both the image and the editing action. We also propose four novel, task specific metrics: Semantical Distance (SeD), Outer Mask Preserving Score (OMPS), Inner Patch Preserving Score (IPPS), and Directional Similarity (DiS). These metrics are designed to quantify pixel level matching fidelity, check preservation of non edited (out of mask) regions, and measure semantic alignment with the desired task. Using this benchmark, we conduct the first large scale systematic analysis of the field, evaluating 17 SOTA models. Our results reveal clear trade offs among current approaches and establish a robust, reproducible baseline to guide future research. Our dataset and evaluation toolkit will be made publicly available.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_12287
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	RealDrag: The First Dragging Benchmark with Real Target Image Zafarani, Ahmad Dehghanian, Zahra Davoodi, Mohammadreza Shadroo, Mohsen Fazli, MohammadAmin Rabiee, Hamid R. Computer Vision and Pattern Recognition The evaluation of drag based image editing models is unreliable due to a lack of standardized benchmarks and metrics. This ambiguity stems from inconsistent evaluation protocols and, critically, the absence of datasets containing ground truth target images, making objective comparisons between competing methods difficult. To address this, we introduce \textbf{RealDrag}, the first comprehensive benchmark for point based image editing that includes paired ground truth target images. Our dataset contains over 400 human annotated samples from diverse video sources, providing source/target images, handle/target points, editable region masks, and descriptive captions for both the image and the editing action. We also propose four novel, task specific metrics: Semantical Distance (SeD), Outer Mask Preserving Score (OMPS), Inner Patch Preserving Score (IPPS), and Directional Similarity (DiS). These metrics are designed to quantify pixel level matching fidelity, check preservation of non edited (out of mask) regions, and measure semantic alignment with the desired task. Using this benchmark, we conduct the first large scale systematic analysis of the field, evaluating 17 SOTA models. Our results reveal clear trade offs among current approaches and establish a robust, reproducible baseline to guide future research. Our dataset and evaluation toolkit will be made publicly available.
title	RealDrag: The First Dragging Benchmark with Real Target Image
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2512.12287

Similar Items