Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Romani, Mohammad
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Cryptography and Security Machine Learning
Online Access:	https://arxiv.org/abs/2511.14554
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918265993822208
author	Romani, Mohammad
author_facet	Romani, Mohammad
contents	Modern deepfakes evade detection by leaving subtle, domain-speci c artifacts that single branch networks miss. ForensicFlow addresses this by fusing evidence across three forensic dimensions: global visual inconsistencies (via ConvNeXt-tiny), ne-grained texture anomalies (via Swin Transformer-tiny), and spectral noise patterns (via CNN with channel attention). Our attention-based temporal pooling dynamically prioritizes high-evidence frames, while adaptive fusion weights each branch according to forgery type. Trained on CelebDF(v2) with Focal Loss, the model achieves AUC 0.9752, F1 0.9408, and accuracy 0.9208 out performing single-stream detectors. Ablation studies con rm branch synergy, and Grad-CAM visualizations validate focus on genuine manipulation regions (e.g., facial boundaries). This multi-domain fusion strategy establishes robustness against increasingly sophisticated forgeries.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_14554
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	ForensicFlow: A Tri-Modal Adaptive Network for Robust Deepfake Detection Romani, Mohammad Computer Vision and Pattern Recognition Cryptography and Security Machine Learning Modern deepfakes evade detection by leaving subtle, domain-speci c artifacts that single branch networks miss. ForensicFlow addresses this by fusing evidence across three forensic dimensions: global visual inconsistencies (via ConvNeXt-tiny), ne-grained texture anomalies (via Swin Transformer-tiny), and spectral noise patterns (via CNN with channel attention). Our attention-based temporal pooling dynamically prioritizes high-evidence frames, while adaptive fusion weights each branch according to forgery type. Trained on CelebDF(v2) with Focal Loss, the model achieves AUC 0.9752, F1 0.9408, and accuracy 0.9208 out performing single-stream detectors. Ablation studies con rm branch synergy, and Grad-CAM visualizations validate focus on genuine manipulation regions (e.g., facial boundaries). This multi-domain fusion strategy establishes robustness against increasingly sophisticated forgeries.
title	ForensicFlow: A Tri-Modal Adaptive Network for Robust Deepfake Detection
topic	Computer Vision and Pattern Recognition Cryptography and Security Machine Learning
url	https://arxiv.org/abs/2511.14554

Similar Items