Saved in:
Bibliographic Details
Main Authors: Doropoulos, Stavros, Karapalidou, Elisavet, Charitidis, Polychronis, Karakeva, Sophia, Vologiannidis, Stavros
Format: Recurso digital
Language:
Published: Zenodo 2025
Online Access:https://doi.org/10.5281/zenodo.15767938
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • <p>This dataset accompanies the study Beyond Manual Media Coding: Evaluating Large Language Models and Agents for News Content Analysis.</p> <p>It provides a reproducible benchmark for evaluating automated content analysis methods against human-annotated ground truth.</p> <p>The dataset includes:</p> <ul> <li> <p><strong><code>articles.csv</code></strong><br>Contains the 200 news articles collected for this study, each with:</p> <ul> <li> <p><code>id</code>: unique identifier</p> </li> <li> <p><code>url</code>: source URL of the original article</p> </li> <li> <p><code>content</code>: full text of the news article</p> </li> </ul> </li> <li> <p><strong><code>codebook.json</code></strong><br>A structured JSON file defining the 26-question analysis codebook used for annotation.<br>Each question entry specifies:</p> <ul> <li> <p><code>questionId</code>: question ID (e.g., Q1)</p> </li> <li> <p><code>prompt</code>: annotation question text</p> </li> <li> <p><code>questionAnswerType</code>: type (SINGLE_CHOICE or MULTI_CHOICE)</p> </li> <li> <p><code>eligibleQuestionAnswers</code>: list of possible tags / codes</p> </li> </ul> </li> <li> <p><strong><code>annotations.json</code></strong><br>Contains the complete human annotation data.<br>For each article <code>id</code>, it provides the list of responses to all 26 codebook questions as determined by an expert annotator, establishing the ground truth labels.</p> </li> </ul> <p> </p> <h2>Intended use</h2> <ul> <li> <p>Designed for research popuses including natural language understanding, content classification, and LLM evaluation.</p> </li> <li><strong>Please request access with your academic email.</strong></li> </ul>