Saved in:
Bibliographic Details
Main Authors: Dufour, Nicholas, Pathak, Arkanath, Samangouei, Pouya, Hariri, Nikki, Deshetti, Shashi, Dudfield, Andrew, Guess, Christopher, Escayola, Pablo Hernández, Tran, Bobby, Babakar, Mevan, Bregler, Christoph
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2405.11697
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916253777526784
author Dufour, Nicholas
Pathak, Arkanath
Samangouei, Pouya
Hariri, Nikki
Deshetti, Shashi
Dudfield, Andrew
Guess, Christopher
Escayola, Pablo Hernández
Tran, Bobby
Babakar, Mevan
Bregler, Christoph
author_facet Dufour, Nicholas
Pathak, Arkanath
Samangouei, Pouya
Hariri, Nikki
Deshetti, Shashi
Dudfield, Andrew
Guess, Christopher
Escayola, Pablo Hernández
Tran, Bobby
Babakar, Mevan
Bregler, Christoph
contents The prevalence and harms of online misinformation is a perennial concern for internet platforms, institutions and society at large. Over time, information shared online has become more media-heavy and misinformation has readily adapted to these new modalities. The rise of generative AI-based tools, which provide widely-accessible methods for synthesizing realistic audio, images, video and human-like text, have amplified these concerns. Despite intense public interest and significant press coverage, quantitative information on the prevalence and modality of media-based misinformation remains scarce. Here, we present the results of a two-year study using human raters to annotate online media-based misinformation, mostly focusing on images, based on claims assessed in a large sample of publicly-accessible fact checks with the ClaimReview markup. We present an image typology, designed to capture aspects of the image and manipulation relevant to the image's role in the misinformation claim. We visualize the distribution of these types over time. We show the rise of generative AI-based content in misinformation claims, and that its commonality is a relatively recent phenomenon, occurring significantly after heavy press coverage. We also show "simple" methods dominated historically, particularly context manipulations, and continued to hold a majority as of the end of data collection in November 2023. The dataset, Annotated Misinformation, Media-Based (AMMeBa), is publicly-available, and we hope that these data will serve as both a means of evaluating mitigation methods in a realistic setting and as a first-of-its-kind census of the types and modalities of online misinformation.
format Preprint
id arxiv_https___arxiv_org_abs_2405_11697
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle AMMeBa: A Large-Scale Survey and Dataset of Media-Based Misinformation In-The-Wild
Dufour, Nicholas
Pathak, Arkanath
Samangouei, Pouya
Hariri, Nikki
Deshetti, Shashi
Dudfield, Andrew
Guess, Christopher
Escayola, Pablo Hernández
Tran, Bobby
Babakar, Mevan
Bregler, Christoph
Computers and Society
The prevalence and harms of online misinformation is a perennial concern for internet platforms, institutions and society at large. Over time, information shared online has become more media-heavy and misinformation has readily adapted to these new modalities. The rise of generative AI-based tools, which provide widely-accessible methods for synthesizing realistic audio, images, video and human-like text, have amplified these concerns. Despite intense public interest and significant press coverage, quantitative information on the prevalence and modality of media-based misinformation remains scarce. Here, we present the results of a two-year study using human raters to annotate online media-based misinformation, mostly focusing on images, based on claims assessed in a large sample of publicly-accessible fact checks with the ClaimReview markup. We present an image typology, designed to capture aspects of the image and manipulation relevant to the image's role in the misinformation claim. We visualize the distribution of these types over time. We show the rise of generative AI-based content in misinformation claims, and that its commonality is a relatively recent phenomenon, occurring significantly after heavy press coverage. We also show "simple" methods dominated historically, particularly context manipulations, and continued to hold a majority as of the end of data collection in November 2023. The dataset, Annotated Misinformation, Media-Based (AMMeBa), is publicly-available, and we hope that these data will serve as both a means of evaluating mitigation methods in a realistic setting and as a first-of-its-kind census of the types and modalities of online misinformation.
title AMMeBa: A Large-Scale Survey and Dataset of Media-Based Misinformation In-The-Wild
topic Computers and Society
url https://arxiv.org/abs/2405.11697