MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Marik, Aritra, Klemt, Marcel, Rohrbach, Anna
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2605.19630
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866911698300960768
author	Marik, Aritra Klemt, Marcel Rohrbach, Anna
author_facet	Marik, Aritra Klemt, Marcel Rohrbach, Anna
contents	With every advancement in generative AI models, forensics is under increasing pressure. The constant emergence of new generation techniques makes it impossible to collect data for each manipulation to train a deepfake detection model. Thus, generalizing to deepfakes unseen during training is one of the major challenges in current deepfake detection research. To tackle this challenge, we employ high-level semantic cues and argue that these cues can support low-level focused approaches in generalizing to unseen types of manipulations. In this work, we study emotions as a high-level semantic cue. We propose Emo-Boost, a multimodal deepfake detection framework that fuses an off-the-shelf RGB- and acoustic-focused deepfake detector with our emotion-based deepfake detector EmoForensics. EmoForensics utilises vision and audio emotion recognition modules and models intra- and inter-modal temporal consistency in emotion representations from an audio-visual stream. We found that EmoForensics and the low-level focused method capture complementary signals. Consequently, combining both signals in EmoBoost enhances the average cross-manipulation generalization AUC by 2.1% on FakeAVCeleb.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_19630
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	EMO-BOOST: Emotion-Augmented Audio-Visual Features for Improved Generalization in Deepfake Detection Marik, Aritra Klemt, Marcel Rohrbach, Anna Artificial Intelligence With every advancement in generative AI models, forensics is under increasing pressure. The constant emergence of new generation techniques makes it impossible to collect data for each manipulation to train a deepfake detection model. Thus, generalizing to deepfakes unseen during training is one of the major challenges in current deepfake detection research. To tackle this challenge, we employ high-level semantic cues and argue that these cues can support low-level focused approaches in generalizing to unseen types of manipulations. In this work, we study emotions as a high-level semantic cue. We propose Emo-Boost, a multimodal deepfake detection framework that fuses an off-the-shelf RGB- and acoustic-focused deepfake detector with our emotion-based deepfake detector EmoForensics. EmoForensics utilises vision and audio emotion recognition modules and models intra- and inter-modal temporal consistency in emotion representations from an audio-visual stream. We found that EmoForensics and the low-level focused method capture complementary signals. Consequently, combining both signals in EmoBoost enhances the average cross-manipulation generalization AUC by 2.1% on FakeAVCeleb.
title	EMO-BOOST: Emotion-Augmented Audio-Visual Features for Improved Generalization in Deepfake Detection
topic	Artificial Intelligence
url	https://arxiv.org/abs/2605.19630

Documenti analoghi