Saved in:
Bibliographic Details
Main Authors: Soprano, Michael, Cioci, Andrea, Mizzaro, Stefano
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.04797
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915983928590336
author Soprano, Michael
Cioci, Andrea
Mizzaro, Stefano
author_facet Soprano, Michael
Cioci, Andrea
Mizzaro, Stefano
contents Deepfakes are increasingly realistic and easy to produce, raising concerns about the reliability of human judgments in misinformation settings. We study audiovisual deepfake detection by measuring how consistently crowd workers distinguish authentic from manipulated videos and, when they flag a video as manipulated, how accurately they identify the manipulation type (audio-only, video-only, or audio-video) and how consistently they report manipulation timestamps. We run two matched crowdsourcing studies on Prolific using AV-Deepfake1M and the Trusted Media Challenge (TMC) dataset. We sample 48 videos per dataset (96 total) and collect 960 judgments (10 per video). Results show that crowd workers rarely misclassify authentic videos as manipulated, but they miss many manipulations, and agreement remains limited across videos. Aggregating multiple judgments per video stabilizes the authenticity signal, but it cannot recover manipulations that most workers consistently miss. Manipulation type identification is substantially noisier than authenticity detection even when workers detect a manipulation, with joint audio-video cases being particularly hard to recognize. Overall, these findings suggest that crowdsourcing can provide a scalable screening signal for audiovisual authenticity, while reliable modality attribution remains an open challenge.
format Preprint
id arxiv_https___arxiv_org_abs_2605_04797
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Beyond Seeing Is Believing: On Crowdsourced Detection of Audiovisual Deepfakes
Soprano, Michael
Cioci, Andrea
Mizzaro, Stefano
Information Retrieval
Artificial Intelligence
H.5.1; H.3.3; I.2.6
Deepfakes are increasingly realistic and easy to produce, raising concerns about the reliability of human judgments in misinformation settings. We study audiovisual deepfake detection by measuring how consistently crowd workers distinguish authentic from manipulated videos and, when they flag a video as manipulated, how accurately they identify the manipulation type (audio-only, video-only, or audio-video) and how consistently they report manipulation timestamps. We run two matched crowdsourcing studies on Prolific using AV-Deepfake1M and the Trusted Media Challenge (TMC) dataset. We sample 48 videos per dataset (96 total) and collect 960 judgments (10 per video). Results show that crowd workers rarely misclassify authentic videos as manipulated, but they miss many manipulations, and agreement remains limited across videos. Aggregating multiple judgments per video stabilizes the authenticity signal, but it cannot recover manipulations that most workers consistently miss. Manipulation type identification is substantially noisier than authenticity detection even when workers detect a manipulation, with joint audio-video cases being particularly hard to recognize. Overall, these findings suggest that crowdsourcing can provide a scalable screening signal for audiovisual authenticity, while reliable modality attribution remains an open challenge.
title Beyond Seeing Is Believing: On Crowdsourced Detection of Audiovisual Deepfakes
topic Information Retrieval
Artificial Intelligence
H.5.1; H.3.3; I.2.6
url https://arxiv.org/abs/2605.04797