Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.10682 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866914219389091840 |
|---|---|
| author | Bhattacharjee, Anindya Islam, Kaidul Anan, Kafi Intesher, Ashir Fuad, Abrar Assaeem Saha, Utsab Imtiaz, Hafiz |
| author_facet | Bhattacharjee, Anindya Islam, Kaidul Anan, Kafi Intesher, Ashir Fuad, Abrar Assaeem Saha, Utsab Imtiaz, Hafiz |
| contents | The spread of deepfakes poses significant security concerns, demanding reliable detection methods. However, diverse generation techniques and class imbalance in datasets create challenges. We propose CAE-Net, a Convolution- and Attention-based weighted Ensemble network combining spatial and frequency-domain features for effective deepfake detection. The architecture integrates EfficientNet, Data-Efficient Image Transformer (DeiT), and ConvNeXt with wavelet features to learn complementary representations. We evaluated CAE-Net on the diverse IEEE Signal Processing Cup 2025 (DF-Wild Cup) dataset, which has a 5:1 fake-to-real class imbalance. To address this, we introduce a multistage disjoint-subset training strategy, sequentially training the model on non-overlapping subsets of the fake class while retaining knowledge across stages. Our approach achieved $94.46\%$ accuracy and a $97.60\%$ AUC, outperforming conventional class-balancing methods. Visualizations confirm the network focuses on meaningful facial regions, and our ensemble design demonstrates robustness against adversarial attacks, positioning CAE-Net as a dependable and generalized deepfake detection framework. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2502_10682 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | CAE-Net: Generalized Deepfake Image Detection using Convolution and Attention Mechanisms with Spatial and Frequency Domain Features Bhattacharjee, Anindya Islam, Kaidul Anan, Kafi Intesher, Ashir Fuad, Abrar Assaeem Saha, Utsab Imtiaz, Hafiz Computer Vision and Pattern Recognition Machine Learning Image and Video Processing The spread of deepfakes poses significant security concerns, demanding reliable detection methods. However, diverse generation techniques and class imbalance in datasets create challenges. We propose CAE-Net, a Convolution- and Attention-based weighted Ensemble network combining spatial and frequency-domain features for effective deepfake detection. The architecture integrates EfficientNet, Data-Efficient Image Transformer (DeiT), and ConvNeXt with wavelet features to learn complementary representations. We evaluated CAE-Net on the diverse IEEE Signal Processing Cup 2025 (DF-Wild Cup) dataset, which has a 5:1 fake-to-real class imbalance. To address this, we introduce a multistage disjoint-subset training strategy, sequentially training the model on non-overlapping subsets of the fake class while retaining knowledge across stages. Our approach achieved $94.46\%$ accuracy and a $97.60\%$ AUC, outperforming conventional class-balancing methods. Visualizations confirm the network focuses on meaningful facial regions, and our ensemble design demonstrates robustness against adversarial attacks, positioning CAE-Net as a dependable and generalized deepfake detection framework. |
| title | CAE-Net: Generalized Deepfake Image Detection using Convolution and Attention Mechanisms with Spatial and Frequency Domain Features |
| topic | Computer Vision and Pattern Recognition Machine Learning Image and Video Processing |
| url | https://arxiv.org/abs/2502.10682 |