Saved in:
Bibliographic Details
Main Authors: Xi, Aprille J., Chen, Eason
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2501.15656
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915130723270656
author Xi, Aprille J.
Chen, Eason
author_facet Xi, Aprille J.
Chen, Eason
contents The proliferation of deepfake technology poses significant challenges to the authenticity and trustworthiness of digital media, necessitating the development of robust detection methods. This study explores the application of Swin Transformers, a state-of-the-art architecture leveraging shifted windows for self-attention, in detecting and classifying deepfake images. Using the Real and Fake Face Detection dataset by Yonsei University's Computational Intelligence Photography Lab, we evaluate the Swin Transformer and hybrid models such as Swin-ResNet and Swin-KNN, focusing on their ability to identify subtle manipulation artifacts. Our results demonstrate that the Swin Transformer outperforms conventional CNN-based architectures, including VGG16, ResNet18, and AlexNet, achieving a test accuracy of 71.29%. Additionally, we present insights into hybrid model design, highlighting the complementary strengths of transformer and CNN-based approaches in deepfake detection. This study underscores the potential of transformer-based architectures for improving accuracy and generalizability in image-based manipulation detection, paving the way for more effective countermeasures against deepfake threats.
format Preprint
id arxiv_https___arxiv_org_abs_2501_15656
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Classifying Deepfakes Using Swin Transformers
Xi, Aprille J.
Chen, Eason
Computer Vision and Pattern Recognition
The proliferation of deepfake technology poses significant challenges to the authenticity and trustworthiness of digital media, necessitating the development of robust detection methods. This study explores the application of Swin Transformers, a state-of-the-art architecture leveraging shifted windows for self-attention, in detecting and classifying deepfake images. Using the Real and Fake Face Detection dataset by Yonsei University's Computational Intelligence Photography Lab, we evaluate the Swin Transformer and hybrid models such as Swin-ResNet and Swin-KNN, focusing on their ability to identify subtle manipulation artifacts. Our results demonstrate that the Swin Transformer outperforms conventional CNN-based architectures, including VGG16, ResNet18, and AlexNet, achieving a test accuracy of 71.29%. Additionally, we present insights into hybrid model design, highlighting the complementary strengths of transformer and CNN-based approaches in deepfake detection. This study underscores the potential of transformer-based architectures for improving accuracy and generalizability in image-based manipulation detection, paving the way for more effective countermeasures against deepfake threats.
title Classifying Deepfakes Using Swin Transformers
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2501.15656