Saved in:
Bibliographic Details
Main Authors: Praharaj, Ipsita, Butala, Yukta, Praharaj, Badrikanath, Butala, Yash
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.12543
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912575386550272
author Praharaj, Ipsita
Butala, Yukta
Praharaj, Badrikanath
Butala, Yash
author_facet Praharaj, Ipsita
Butala, Yukta
Praharaj, Badrikanath
Butala, Yash
contents The rapid advancement of generative models has intensified the challenge of detecting and interpreting visual forgeries, necessitating robust frameworks for image forgery detection while providing reasoning as well as localization. While existing works approach this problem using supervised training for specific manipulation or anomaly detection in the embedding space, generalization across domains remains a challenge. We frame this problem of forgery detection as a prompt-driven visual reasoning task, leveraging the semantic alignment capabilities of large vision-language models. We propose a framework, `REVEAL` (Reasoning and Evaluation of Visual Evidence through Aligned Language), that incorporates generalized guidelines. We propose two tangential approaches - (1) Holistic Scene-level Evaluation that relies on the physics, semantics, perspective, and realism of the image as a whole and (2) Region-wise anomaly detection that splits the image into multiple regions and analyzes each of them. We conduct experiments over datasets from different domains (Photoshop, DeepFake and AIGC editing). We compare the Vision Language Models against competitive baselines and analyze the reasoning provided by them.
format Preprint
id arxiv_https___arxiv_org_abs_2508_12543
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle REVEAL -- Reasoning and Evaluation of Visual Evidence through Aligned Language
Praharaj, Ipsita
Butala, Yukta
Praharaj, Badrikanath
Butala, Yash
Computer Vision and Pattern Recognition
The rapid advancement of generative models has intensified the challenge of detecting and interpreting visual forgeries, necessitating robust frameworks for image forgery detection while providing reasoning as well as localization. While existing works approach this problem using supervised training for specific manipulation or anomaly detection in the embedding space, generalization across domains remains a challenge. We frame this problem of forgery detection as a prompt-driven visual reasoning task, leveraging the semantic alignment capabilities of large vision-language models. We propose a framework, `REVEAL` (Reasoning and Evaluation of Visual Evidence through Aligned Language), that incorporates generalized guidelines. We propose two tangential approaches - (1) Holistic Scene-level Evaluation that relies on the physics, semantics, perspective, and realism of the image as a whole and (2) Region-wise anomaly detection that splits the image into multiple regions and analyzes each of them. We conduct experiments over datasets from different domains (Photoshop, DeepFake and AIGC editing). We compare the Vision Language Models against competitive baselines and analyze the reasoning provided by them.
title REVEAL -- Reasoning and Evaluation of Visual Evidence through Aligned Language
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2508.12543