Salvato in:
| Autori principali: | , , , , |
|---|---|
| Natura: | Preprint |
| Pubblicazione: |
2023
|
| Soggetti: | |
| Accesso online: | https://arxiv.org/abs/2311.03782 |
| Tags: |
Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
|
| _version_ | 1866909646350974976 |
|---|---|
| author | Ahmad, Wasim Peng, Yan-Tsung Chang, Yuan-Hao Ganfure, Gaddisa Olani Khan, Sarwar |
| author_facet | Ahmad, Wasim Peng, Yan-Tsung Chang, Yuan-Hao Ganfure, Gaddisa Olani Khan, Sarwar |
| contents | Deep-fake videos, generated through AI face-swapping techniques, have gained significant attention due to their potential for impactful impersonation attacks. While most research focuses on real vs. fake detection, attributing a deep-fake to its specific generation model or encoder is vital for forensic analysis, enabling source tracing and tailored countermeasures. This enhances detection by leveraging model-specific artifacts and supports proactive defenses. We investigate the model attribution problem for deep-fake videos using two datasets: Deepfakes from Different Models (DFDM) and GANGen-Detection, both comprising deep-fake videos and GAN-generated images. We use only fake images from GANGen-Detection to align with DFDM's focus on attribution rather than binary classification. We formulate the task as a multiclass classification problem and introduce a novel Capsule-Spatial-Temporal (CapST) model that integrates a truncated VGG19 network for feature extraction, capsule networks for hierarchical encoding, and a spatio-temporal attention mechanism. Video-level fusion captures temporal dependencies across frames. Experiments on DFDM and GANGen-Detection show CapST outperforms baseline models in attribution accuracy while reducing computational cost. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2311_03782 |
| institution | arXiv |
| publishDate | 2023 |
| record_format | arxiv |
| spellingShingle | CapST: Leveraging Capsule Networks and Temporal Attention for Accurate Model Attribution in Deep-fake Videos Ahmad, Wasim Peng, Yan-Tsung Chang, Yuan-Hao Ganfure, Gaddisa Olani Khan, Sarwar Computer Vision and Pattern Recognition acmart Deep-fake videos, generated through AI face-swapping techniques, have gained significant attention due to their potential for impactful impersonation attacks. While most research focuses on real vs. fake detection, attributing a deep-fake to its specific generation model or encoder is vital for forensic analysis, enabling source tracing and tailored countermeasures. This enhances detection by leveraging model-specific artifacts and supports proactive defenses. We investigate the model attribution problem for deep-fake videos using two datasets: Deepfakes from Different Models (DFDM) and GANGen-Detection, both comprising deep-fake videos and GAN-generated images. We use only fake images from GANGen-Detection to align with DFDM's focus on attribution rather than binary classification. We formulate the task as a multiclass classification problem and introduce a novel Capsule-Spatial-Temporal (CapST) model that integrates a truncated VGG19 network for feature extraction, capsule networks for hierarchical encoding, and a spatio-temporal attention mechanism. Video-level fusion captures temporal dependencies across frames. Experiments on DFDM and GANGen-Detection show CapST outperforms baseline models in attribution accuracy while reducing computational cost. |
| title | CapST: Leveraging Capsule Networks and Temporal Attention for Accurate Model Attribution in Deep-fake Videos |
| topic | Computer Vision and Pattern Recognition acmart |
| url | https://arxiv.org/abs/2311.03782 |