Saved in:
Bibliographic Details
Main Authors: Zafarani, Ahmad, Dehghanian, Zahra, Davoodi, Mohammadreza, Shadroo, Mohsen, Fazli, MohammadAmin, Rabiee, Hamid R.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2512.12287
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908708674469888
author Zafarani, Ahmad
Dehghanian, Zahra
Davoodi, Mohammadreza
Shadroo, Mohsen
Fazli, MohammadAmin
Rabiee, Hamid R.
author_facet Zafarani, Ahmad
Dehghanian, Zahra
Davoodi, Mohammadreza
Shadroo, Mohsen
Fazli, MohammadAmin
Rabiee, Hamid R.
contents The evaluation of drag based image editing models is unreliable due to a lack of standardized benchmarks and metrics. This ambiguity stems from inconsistent evaluation protocols and, critically, the absence of datasets containing ground truth target images, making objective comparisons between competing methods difficult. To address this, we introduce \textbf{RealDrag}, the first comprehensive benchmark for point based image editing that includes paired ground truth target images. Our dataset contains over 400 human annotated samples from diverse video sources, providing source/target images, handle/target points, editable region masks, and descriptive captions for both the image and the editing action. We also propose four novel, task specific metrics: Semantical Distance (SeD), Outer Mask Preserving Score (OMPS), Inner Patch Preserving Score (IPPS), and Directional Similarity (DiS). These metrics are designed to quantify pixel level matching fidelity, check preservation of non edited (out of mask) regions, and measure semantic alignment with the desired task. Using this benchmark, we conduct the first large scale systematic analysis of the field, evaluating 17 SOTA models. Our results reveal clear trade offs among current approaches and establish a robust, reproducible baseline to guide future research. Our dataset and evaluation toolkit will be made publicly available.
format Preprint
id arxiv_https___arxiv_org_abs_2512_12287
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle RealDrag: The First Dragging Benchmark with Real Target Image
Zafarani, Ahmad
Dehghanian, Zahra
Davoodi, Mohammadreza
Shadroo, Mohsen
Fazli, MohammadAmin
Rabiee, Hamid R.
Computer Vision and Pattern Recognition
The evaluation of drag based image editing models is unreliable due to a lack of standardized benchmarks and metrics. This ambiguity stems from inconsistent evaluation protocols and, critically, the absence of datasets containing ground truth target images, making objective comparisons between competing methods difficult. To address this, we introduce \textbf{RealDrag}, the first comprehensive benchmark for point based image editing that includes paired ground truth target images. Our dataset contains over 400 human annotated samples from diverse video sources, providing source/target images, handle/target points, editable region masks, and descriptive captions for both the image and the editing action. We also propose four novel, task specific metrics: Semantical Distance (SeD), Outer Mask Preserving Score (OMPS), Inner Patch Preserving Score (IPPS), and Directional Similarity (DiS). These metrics are designed to quantify pixel level matching fidelity, check preservation of non edited (out of mask) regions, and measure semantic alignment with the desired task. Using this benchmark, we conduct the first large scale systematic analysis of the field, evaluating 17 SOTA models. Our results reveal clear trade offs among current approaches and establish a robust, reproducible baseline to guide future research. Our dataset and evaluation toolkit will be made publicly available.
title RealDrag: The First Dragging Benchmark with Real Target Image
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2512.12287