Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Soprano, Michael, Cioci, Andrea, Mizzaro, Stefano
Format:	Preprint
Published:	2026
Subjects:	Information Retrieval Artificial Intelligence H.5.1; H.3.3; I.2.6
Online Access:	https://arxiv.org/abs/2605.04797
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915983928590336
author	Soprano, Michael Cioci, Andrea Mizzaro, Stefano
author_facet	Soprano, Michael Cioci, Andrea Mizzaro, Stefano
contents	Deepfakes are increasingly realistic and easy to produce, raising concerns about the reliability of human judgments in misinformation settings. We study audiovisual deepfake detection by measuring how consistently crowd workers distinguish authentic from manipulated videos and, when they flag a video as manipulated, how accurately they identify the manipulation type (audio-only, video-only, or audio-video) and how consistently they report manipulation timestamps. We run two matched crowdsourcing studies on Prolific using AV-Deepfake1M and the Trusted Media Challenge (TMC) dataset. We sample 48 videos per dataset (96 total) and collect 960 judgments (10 per video). Results show that crowd workers rarely misclassify authentic videos as manipulated, but they miss many manipulations, and agreement remains limited across videos. Aggregating multiple judgments per video stabilizes the authenticity signal, but it cannot recover manipulations that most workers consistently miss. Manipulation type identification is substantially noisier than authenticity detection even when workers detect a manipulation, with joint audio-video cases being particularly hard to recognize. Overall, these findings suggest that crowdsourcing can provide a scalable screening signal for audiovisual authenticity, while reliable modality attribution remains an open challenge.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_04797
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Beyond Seeing Is Believing: On Crowdsourced Detection of Audiovisual Deepfakes Soprano, Michael Cioci, Andrea Mizzaro, Stefano Information Retrieval Artificial Intelligence H.5.1; H.3.3; I.2.6 Deepfakes are increasingly realistic and easy to produce, raising concerns about the reliability of human judgments in misinformation settings. We study audiovisual deepfake detection by measuring how consistently crowd workers distinguish authentic from manipulated videos and, when they flag a video as manipulated, how accurately they identify the manipulation type (audio-only, video-only, or audio-video) and how consistently they report manipulation timestamps. We run two matched crowdsourcing studies on Prolific using AV-Deepfake1M and the Trusted Media Challenge (TMC) dataset. We sample 48 videos per dataset (96 total) and collect 960 judgments (10 per video). Results show that crowd workers rarely misclassify authentic videos as manipulated, but they miss many manipulations, and agreement remains limited across videos. Aggregating multiple judgments per video stabilizes the authenticity signal, but it cannot recover manipulations that most workers consistently miss. Manipulation type identification is substantially noisier than authenticity detection even when workers detect a manipulation, with joint audio-video cases being particularly hard to recognize. Overall, these findings suggest that crowdsourcing can provide a scalable screening signal for audiovisual authenticity, while reliable modality attribution remains an open challenge.
title	Beyond Seeing Is Believing: On Crowdsourced Detection of Audiovisual Deepfakes
topic	Information Retrieval Artificial Intelligence H.5.1; H.3.3; I.2.6
url	https://arxiv.org/abs/2605.04797

Similar Items