Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Praharaj, Ipsita, Butala, Yukta, Praharaj, Badrikanath, Butala, Yash
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2508.12543
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912575386550272
author	Praharaj, Ipsita Butala, Yukta Praharaj, Badrikanath Butala, Yash
author_facet	Praharaj, Ipsita Butala, Yukta Praharaj, Badrikanath Butala, Yash
contents	The rapid advancement of generative models has intensified the challenge of detecting and interpreting visual forgeries, necessitating robust frameworks for image forgery detection while providing reasoning as well as localization. While existing works approach this problem using supervised training for specific manipulation or anomaly detection in the embedding space, generalization across domains remains a challenge. We frame this problem of forgery detection as a prompt-driven visual reasoning task, leveraging the semantic alignment capabilities of large vision-language models. We propose a framework, `REVEAL` (Reasoning and Evaluation of Visual Evidence through Aligned Language), that incorporates generalized guidelines. We propose two tangential approaches - (1) Holistic Scene-level Evaluation that relies on the physics, semantics, perspective, and realism of the image as a whole and (2) Region-wise anomaly detection that splits the image into multiple regions and analyzes each of them. We conduct experiments over datasets from different domains (Photoshop, DeepFake and AIGC editing). We compare the Vision Language Models against competitive baselines and analyze the reasoning provided by them.
format	Preprint
id	arxiv_https___arxiv_org_abs_2508_12543
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	REVEAL -- Reasoning and Evaluation of Visual Evidence through Aligned Language Praharaj, Ipsita Butala, Yukta Praharaj, Badrikanath Butala, Yash Computer Vision and Pattern Recognition The rapid advancement of generative models has intensified the challenge of detecting and interpreting visual forgeries, necessitating robust frameworks for image forgery detection while providing reasoning as well as localization. While existing works approach this problem using supervised training for specific manipulation or anomaly detection in the embedding space, generalization across domains remains a challenge. We frame this problem of forgery detection as a prompt-driven visual reasoning task, leveraging the semantic alignment capabilities of large vision-language models. We propose a framework, `REVEAL` (Reasoning and Evaluation of Visual Evidence through Aligned Language), that incorporates generalized guidelines. We propose two tangential approaches - (1) Holistic Scene-level Evaluation that relies on the physics, semantics, perspective, and realism of the image as a whole and (2) Region-wise anomaly detection that splits the image into multiple regions and analyzes each of them. We conduct experiments over datasets from different domains (Photoshop, DeepFake and AIGC editing). We compare the Vision Language Models against competitive baselines and analyze the reasoning provided by them.
title	REVEAL -- Reasoning and Evaluation of Visual Evidence through Aligned Language
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2508.12543

Similar Items