Saved in:
Bibliographic Details
Main Authors: Li, Han, Sun, Hua
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.14954
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917343991431168
author Li, Han
Sun, Hua
author_facet Li, Han
Sun, Hua
contents Social media increasingly disseminates information through mixed image text posts, but rumors often exploit subtle inconsistencies and forged content, making detection based solely on post content difficult. Deep semantic mismatch rumors, which superficially align images and texts, pose particular challenges and threaten online public opinion. Existing multimodal rumor detection methods improve cross modal modeling but suffer from limited feature extraction, noisy alignment, and inflexible fusion strategies, while ignoring external factual evidence necessary for verifying complex rumors. To address these limitations, we propose a multimodal rumor detection model enhanced with external evidence and forgery features. The model uses a ResNet34 visual encoder, a BERT text encoder, and a forgery feature module extracting frequency domain traces and compression artifacts via Fourier transformation. BLIP generated image descriptions bridge image and text semantic spaces. A dual contrastive learning module computes contrastive losses between text image and text description pairs, improving detection of semantic inconsistencies. A gated adaptive feature scaling fusion mechanism dynamically adjusts multimodal fusion and reduces redundancy. Experiments on Weibo and Twitter datasets demonstrate that our model outperforms mainstream baselines in macro accuracy, recall, and F1 score.
format Preprint
id arxiv_https___arxiv_org_abs_2601_14954
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Multimodal Rumor Detection Enhanced by External Evidence and Forgery Features
Li, Han
Sun, Hua
Machine Learning
Social media increasingly disseminates information through mixed image text posts, but rumors often exploit subtle inconsistencies and forged content, making detection based solely on post content difficult. Deep semantic mismatch rumors, which superficially align images and texts, pose particular challenges and threaten online public opinion. Existing multimodal rumor detection methods improve cross modal modeling but suffer from limited feature extraction, noisy alignment, and inflexible fusion strategies, while ignoring external factual evidence necessary for verifying complex rumors. To address these limitations, we propose a multimodal rumor detection model enhanced with external evidence and forgery features. The model uses a ResNet34 visual encoder, a BERT text encoder, and a forgery feature module extracting frequency domain traces and compression artifacts via Fourier transformation. BLIP generated image descriptions bridge image and text semantic spaces. A dual contrastive learning module computes contrastive losses between text image and text description pairs, improving detection of semantic inconsistencies. A gated adaptive feature scaling fusion mechanism dynamically adjusts multimodal fusion and reduces redundancy. Experiments on Weibo and Twitter datasets demonstrate that our model outperforms mainstream baselines in macro accuracy, recall, and F1 score.
title Multimodal Rumor Detection Enhanced by External Evidence and Forgery Features
topic Machine Learning
url https://arxiv.org/abs/2601.14954