Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Doropoulos, Stavros, Karapalidou, Elisavet, Charitidis, Polychronis, Karakeva, Sophia, Vologiannidis, Stavros
Format:	Recurso digital
Language:
Published:	Zenodo 2025
Online Access:	https://doi.org/10.5281/zenodo.15767938
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

This dataset accompanies the study Beyond Manual Media Coding: Evaluating Large Language Models and Agents for News Content Analysis. It provides a reproducible benchmark for evaluating automated content analysis methods against human-annotated ground truth. The dataset includes: <ul> <li> <code>articles.csv</code> Contains the 200 news articles collected for this study, each with: <ul> <li> <code>id</code>: unique identifier </li> <li> <code>url</code>: source URL of the original article </li> <li> <code>content</code>: full text of the news article </li> </ul> </li> <li> <code>codebook.json</code> A structured JSON file defining the 26-question analysis codebook used for annotation. Each question entry specifies: <ul> <li> <code>questionId</code>: question ID (e.g., Q1) </li> <li> <code>prompt</code>: annotation question text </li> <li> <code>questionAnswerType</code>: type (SINGLE_CHOICE or MULTI_CHOICE) </li> <li> <code>eligibleQuestionAnswers</code>: list of possible tags / codes </li> </ul> </li> <li> <code>annotations.json</code> Contains the complete human annotation data. For each article <code>id</code>, it provides the list of responses to all 26 codebook questions as determined by an expert annotator, establishing the ground truth labels. </li> </ul>   <h2>Intended use</h2> <ul> <li> Designed for research popuses including natural language understanding, content classification, and LLM evaluation. </li> <li>Please request access with your academic email.</li> </ul>

Similar Items