Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Nguyen, Quang, Vu, Truong, Nguyen, Trong-Tung, Wen, Yuxin, Robinette, Preston K, Johnson, Taylor T, Goldstein, Tom, Tran, Anh, Nguyen, Khoi
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2412.03809
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909417038938112
author	Nguyen, Quang Vu, Truong Nguyen, Trong-Tung Wen, Yuxin Robinette, Preston K Johnson, Taylor T Goldstein, Tom Tran, Anh Nguyen, Khoi
author_facet	Nguyen, Quang Vu, Truong Nguyen, Trong-Tung Wen, Yuxin Robinette, Preston K Johnson, Taylor T Goldstein, Tom Tran, Anh Nguyen, Khoi
contents	Image editing technologies are tools used to transform, adjust, remove, or otherwise alter images. Recent research has significantly improved the capabilities of image editing tools, enabling the creation of photorealistic and semantically informed forged regions that are nearly indistinguishable from authentic imagery, presenting new challenges in digital forensics and media credibility. While current image forensic techniques are adept at localizing forged regions produced by traditional image manipulation methods, current capabilities struggle to localize regions created by diffusion-based techniques. To bridge this gap, we present a novel framework that integrates a multimodal Large Language Model (LLM) for enhanced reasoning capabilities to localize tampered regions in images produced by diffusion model-based editing methods. By leveraging the contextual and semantic strengths of LLMs, our framework achieves promising results on MagicBrush, AutoSplice, and PerfBrush (novel diffusion-based dataset) datasets, outperforming previous approaches in mIoU and F1-score metrics. Notably, our method excels on the PerfBrush dataset, a self-constructed test set featuring previously unseen types of edits. Here, where traditional methods typically falter, achieving markedly low scores, our approach demonstrates promising performance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_03809
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM Nguyen, Quang Vu, Truong Nguyen, Trong-Tung Wen, Yuxin Robinette, Preston K Johnson, Taylor T Goldstein, Tom Tran, Anh Nguyen, Khoi Computer Vision and Pattern Recognition Image editing technologies are tools used to transform, adjust, remove, or otherwise alter images. Recent research has significantly improved the capabilities of image editing tools, enabling the creation of photorealistic and semantically informed forged regions that are nearly indistinguishable from authentic imagery, presenting new challenges in digital forensics and media credibility. While current image forensic techniques are adept at localizing forged regions produced by traditional image manipulation methods, current capabilities struggle to localize regions created by diffusion-based techniques. To bridge this gap, we present a novel framework that integrates a multimodal Large Language Model (LLM) for enhanced reasoning capabilities to localize tampered regions in images produced by diffusion model-based editing methods. By leveraging the contextual and semantic strengths of LLMs, our framework achieves promising results on MagicBrush, AutoSplice, and PerfBrush (novel diffusion-based dataset) datasets, outperforming previous approaches in mIoU and F1-score metrics. Notably, our method excels on the PerfBrush dataset, a self-constructed test set featuring previously unseen types of edits. Here, where traditional methods typically falter, achieving markedly low scores, our approach demonstrates promising performance.
title	EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2412.03809

Similar Items