Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sagar, A S M Sharifuzzaman, Bennamoun, Mohammed, Boussaid, Farid, Sharif, Naeha, Xu, Lian, Sahmoud, Shaaban, Kishk, Ali
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.01854
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910008627691520
author	Sagar, A S M Sharifuzzaman Bennamoun, Mohammed Boussaid, Farid Sharif, Naeha Xu, Lian Sahmoud, Shaaban Kishk, Ali
author_facet	Sagar, A S M Sharifuzzaman Bennamoun, Mohammed Boussaid, Farid Sharif, Naeha Xu, Lian Sahmoud, Shaaban Kishk, Ali
contents	In multimodal misinformation, deception usually arises not just from pixel-level manipulations in an image, but from the semantic and contextual claim jointly expressed by the image-text pair. Yet most deepfake detectors, engineered to detect pixel-level forgeries, do not account for claim-level meaning, despite their growing integration in automated fact-checking (AFC) pipelines. This raises a central scientific and practical question: Do pixel-level detectors contribute useful signal for verifying image-text claims, or do they instead introduce misleading authenticity priors that undermine evidence-based reasoning? We provide the first systematic analysis of deepfake detectors in the context of multimodal misinformation detection. Using two complementary benchmarks, MMFakeBench and DGM4, we evaluate: (1) state-of-the-art image-only deepfake detectors, (2) an evidence-driven fact-checking system that performs tool-guided retrieval via Monte Carlo Tree Search (MCTS) and engages in deliberative inference through Multi-Agent Debate (MAD), and (3) a hybrid fact-checking system that injects detector outputs as auxiliary evidence. Results across both benchmark datasets show that deepfake detectors offer limited standalone value, achieving F1 scores in the range of 0.26-0.53 on MMFakeBench and 0.33-0.49 on DGM4, and that incorporating their predictions into fact-checking pipelines consistently reduces performance by 0.04-0.08 F1 due to non-causal authenticity assumptions. In contrast, the evidence-centric fact-checking system achieves the highest performance, reaching F1 scores of approximately 0.81 on MMFakeBench and 0.55 on DGM4. Overall, our findings demonstrate that multimodal claim verification is driven primarily by semantic understanding and external evidence, and that pixel-level artifact signals do not reliably enhance reasoning over real-world image-text misinformation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_01854
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Fact or Fake? Assessing the Role of Deepfake Detectors in Multimodal Misinformation Detection Sagar, A S M Sharifuzzaman Bennamoun, Mohammed Boussaid, Farid Sharif, Naeha Xu, Lian Sahmoud, Shaaban Kishk, Ali Computer Vision and Pattern Recognition In multimodal misinformation, deception usually arises not just from pixel-level manipulations in an image, but from the semantic and contextual claim jointly expressed by the image-text pair. Yet most deepfake detectors, engineered to detect pixel-level forgeries, do not account for claim-level meaning, despite their growing integration in automated fact-checking (AFC) pipelines. This raises a central scientific and practical question: Do pixel-level detectors contribute useful signal for verifying image-text claims, or do they instead introduce misleading authenticity priors that undermine evidence-based reasoning? We provide the first systematic analysis of deepfake detectors in the context of multimodal misinformation detection. Using two complementary benchmarks, MMFakeBench and DGM4, we evaluate: (1) state-of-the-art image-only deepfake detectors, (2) an evidence-driven fact-checking system that performs tool-guided retrieval via Monte Carlo Tree Search (MCTS) and engages in deliberative inference through Multi-Agent Debate (MAD), and (3) a hybrid fact-checking system that injects detector outputs as auxiliary evidence. Results across both benchmark datasets show that deepfake detectors offer limited standalone value, achieving F1 scores in the range of 0.26-0.53 on MMFakeBench and 0.33-0.49 on DGM4, and that incorporating their predictions into fact-checking pipelines consistently reduces performance by 0.04-0.08 F1 due to non-causal authenticity assumptions. In contrast, the evidence-centric fact-checking system achieves the highest performance, reaching F1 scores of approximately 0.81 on MMFakeBench and 0.55 on DGM4. Overall, our findings demonstrate that multimodal claim verification is driven primarily by semantic understanding and external evidence, and that pixel-level artifact signals do not reliably enhance reasoning over real-world image-text misinformation.
title	Fact or Fake? Assessing the Role of Deepfake Detectors in Multimodal Misinformation Detection
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.01854

Similar Items