Saved in:
Bibliographic Details
Main Authors: Sagar, A S M Sharifuzzaman, Bennamoun, Mohammed, Boussaid, Farid, Sharif, Naeha, Xu, Lian, Sahmoud, Shaaban, Kishk, Ali
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.01854
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910008627691520
author Sagar, A S M Sharifuzzaman
Bennamoun, Mohammed
Boussaid, Farid
Sharif, Naeha
Xu, Lian
Sahmoud, Shaaban
Kishk, Ali
author_facet Sagar, A S M Sharifuzzaman
Bennamoun, Mohammed
Boussaid, Farid
Sharif, Naeha
Xu, Lian
Sahmoud, Shaaban
Kishk, Ali
contents In multimodal misinformation, deception usually arises not just from pixel-level manipulations in an image, but from the semantic and contextual claim jointly expressed by the image-text pair. Yet most deepfake detectors, engineered to detect pixel-level forgeries, do not account for claim-level meaning, despite their growing integration in automated fact-checking (AFC) pipelines. This raises a central scientific and practical question: Do pixel-level detectors contribute useful signal for verifying image-text claims, or do they instead introduce misleading authenticity priors that undermine evidence-based reasoning? We provide the first systematic analysis of deepfake detectors in the context of multimodal misinformation detection. Using two complementary benchmarks, MMFakeBench and DGM4, we evaluate: (1) state-of-the-art image-only deepfake detectors, (2) an evidence-driven fact-checking system that performs tool-guided retrieval via Monte Carlo Tree Search (MCTS) and engages in deliberative inference through Multi-Agent Debate (MAD), and (3) a hybrid fact-checking system that injects detector outputs as auxiliary evidence. Results across both benchmark datasets show that deepfake detectors offer limited standalone value, achieving F1 scores in the range of 0.26-0.53 on MMFakeBench and 0.33-0.49 on DGM4, and that incorporating their predictions into fact-checking pipelines consistently reduces performance by 0.04-0.08 F1 due to non-causal authenticity assumptions. In contrast, the evidence-centric fact-checking system achieves the highest performance, reaching F1 scores of approximately 0.81 on MMFakeBench and 0.55 on DGM4. Overall, our findings demonstrate that multimodal claim verification is driven primarily by semantic understanding and external evidence, and that pixel-level artifact signals do not reliably enhance reasoning over real-world image-text misinformation.
format Preprint
id arxiv_https___arxiv_org_abs_2602_01854
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Fact or Fake? Assessing the Role of Deepfake Detectors in Multimodal Misinformation Detection
Sagar, A S M Sharifuzzaman
Bennamoun, Mohammed
Boussaid, Farid
Sharif, Naeha
Xu, Lian
Sahmoud, Shaaban
Kishk, Ali
Computer Vision and Pattern Recognition
In multimodal misinformation, deception usually arises not just from pixel-level manipulations in an image, but from the semantic and contextual claim jointly expressed by the image-text pair. Yet most deepfake detectors, engineered to detect pixel-level forgeries, do not account for claim-level meaning, despite their growing integration in automated fact-checking (AFC) pipelines. This raises a central scientific and practical question: Do pixel-level detectors contribute useful signal for verifying image-text claims, or do they instead introduce misleading authenticity priors that undermine evidence-based reasoning? We provide the first systematic analysis of deepfake detectors in the context of multimodal misinformation detection. Using two complementary benchmarks, MMFakeBench and DGM4, we evaluate: (1) state-of-the-art image-only deepfake detectors, (2) an evidence-driven fact-checking system that performs tool-guided retrieval via Monte Carlo Tree Search (MCTS) and engages in deliberative inference through Multi-Agent Debate (MAD), and (3) a hybrid fact-checking system that injects detector outputs as auxiliary evidence. Results across both benchmark datasets show that deepfake detectors offer limited standalone value, achieving F1 scores in the range of 0.26-0.53 on MMFakeBench and 0.33-0.49 on DGM4, and that incorporating their predictions into fact-checking pipelines consistently reduces performance by 0.04-0.08 F1 due to non-causal authenticity assumptions. In contrast, the evidence-centric fact-checking system achieves the highest performance, reaching F1 scores of approximately 0.81 on MMFakeBench and 0.55 on DGM4. Overall, our findings demonstrate that multimodal claim verification is driven primarily by semantic understanding and external evidence, and that pixel-level artifact signals do not reliably enhance reasoning over real-world image-text misinformation.
title Fact or Fake? Assessing the Role of Deepfake Detectors in Multimodal Misinformation Detection
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2602.01854