Saved in:
Bibliographic Details
Main Author: Romani, Mohammad
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.14554
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918265993822208
author Romani, Mohammad
author_facet Romani, Mohammad
contents Modern deepfakes evade detection by leaving subtle, domain-speci c artifacts that single branch networks miss. ForensicFlow addresses this by fusing evidence across three forensic dimensions: global visual inconsistencies (via ConvNeXt-tiny), ne-grained texture anomalies (via Swin Transformer-tiny), and spectral noise patterns (via CNN with channel attention). Our attention-based temporal pooling dynamically prioritizes high-evidence frames, while adaptive fusion weights each branch according to forgery type. Trained on CelebDF(v2) with Focal Loss, the model achieves AUC 0.9752, F1 0.9408, and accuracy 0.9208 out performing single-stream detectors. Ablation studies con rm branch synergy, and Grad-CAM visualizations validate focus on genuine manipulation regions (e.g., facial boundaries). This multi-domain fusion strategy establishes robustness against increasingly sophisticated forgeries.
format Preprint
id arxiv_https___arxiv_org_abs_2511_14554
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle ForensicFlow: A Tri-Modal Adaptive Network for Robust Deepfake Detection
Romani, Mohammad
Computer Vision and Pattern Recognition
Cryptography and Security
Machine Learning
Modern deepfakes evade detection by leaving subtle, domain-speci c artifacts that single branch networks miss. ForensicFlow addresses this by fusing evidence across three forensic dimensions: global visual inconsistencies (via ConvNeXt-tiny), ne-grained texture anomalies (via Swin Transformer-tiny), and spectral noise patterns (via CNN with channel attention). Our attention-based temporal pooling dynamically prioritizes high-evidence frames, while adaptive fusion weights each branch according to forgery type. Trained on CelebDF(v2) with Focal Loss, the model achieves AUC 0.9752, F1 0.9408, and accuracy 0.9208 out performing single-stream detectors. Ablation studies con rm branch synergy, and Grad-CAM visualizations validate focus on genuine manipulation regions (e.g., facial boundaries). This multi-domain fusion strategy establishes robustness against increasingly sophisticated forgeries.
title ForensicFlow: A Tri-Modal Adaptive Network for Robust Deepfake Detection
topic Computer Vision and Pattern Recognition
Cryptography and Security
Machine Learning
url https://arxiv.org/abs/2511.14554