Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Doropoulos, Stavros, Karapalidou, Elisavet, Charitidis, Polychronis, Karakeva, Sophia, Vologiannidis, Stavros
Format:	Recurso digital
Language:
Published:	Zenodo 2025
Online Access:	https://doi.org/10.5281/zenodo.15767938
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866902258402197504
author	Doropoulos, Stavros Karapalidou, Elisavet Charitidis, Polychronis Karakeva, Sophia Vologiannidis, Stavros
author_facet	Doropoulos, Stavros Karapalidou, Elisavet Charitidis, Polychronis Karakeva, Sophia Vologiannidis, Stavros
contents	<p>This dataset accompanies the study Beyond Manual Media Coding: Evaluating Large Language Models and Agents for News Content Analysis.</p> <p>It provides a reproducible benchmark for evaluating automated content analysis methods against human-annotated ground truth.</p> <p>The dataset includes:</p> <ul> <li> <p><strong><code>articles.csv</code></strong><br>Contains the 200 news articles collected for this study, each with:</p> <ul> <li> <p><code>id</code>: unique identifier</p> </li> <li> <p><code>url</code>: source URL of the original article</p> </li> <li> <p><code>content</code>: full text of the news article</p> </li> </ul> </li> <li> <p><strong><code>codebook.json</code></strong><br>A structured JSON file defining the 26-question analysis codebook used for annotation.<br>Each question entry specifies:</p> <ul> <li> <p><code>questionId</code>: question ID (e.g., Q1)</p> </li> <li> <p><code>prompt</code>: annotation question text</p> </li> <li> <p><code>questionAnswerType</code>: type (SINGLE_CHOICE or MULTI_CHOICE)</p> </li> <li> <p><code>eligibleQuestionAnswers</code>: list of possible tags / codes</p> </li> </ul> </li> <li> <p><strong><code>annotations.json</code></strong><br>Contains the complete human annotation data.<br>For each article <code>id</code>, it provides the list of responses to all 26 codebook questions as determined by an expert annotator, establishing the ground truth labels.</p> </li> </ul> <p> </p> <h2>Intended use</h2> <ul> <li> <p>Designed for research popuses including natural language understanding, content classification, and LLM evaluation.</p> </li> <li><strong>Please request access with your academic email.</strong></li> </ul>
format	Recurso digital
id	zenodo_https___doi_org_10_5281_zenodo_15767938
institution	Zenodo
language
publishDate	2025
publisher	Zenodo
record_format	zenodo
spellingShingle	Media Coding Dataset for News Content Analysis Doropoulos, Stavros Karapalidou, Elisavet Charitidis, Polychronis Karakeva, Sophia Vologiannidis, Stavros <p>This dataset accompanies the study Beyond Manual Media Coding: Evaluating Large Language Models and Agents for News Content Analysis.</p> <p>It provides a reproducible benchmark for evaluating automated content analysis methods against human-annotated ground truth.</p> <p>The dataset includes:</p> <ul> <li> <p><strong><code>articles.csv</code></strong><br>Contains the 200 news articles collected for this study, each with:</p> <ul> <li> <p><code>id</code>: unique identifier</p> </li> <li> <p><code>url</code>: source URL of the original article</p> </li> <li> <p><code>content</code>: full text of the news article</p> </li> </ul> </li> <li> <p><strong><code>codebook.json</code></strong><br>A structured JSON file defining the 26-question analysis codebook used for annotation.<br>Each question entry specifies:</p> <ul> <li> <p><code>questionId</code>: question ID (e.g., Q1)</p> </li> <li> <p><code>prompt</code>: annotation question text</p> </li> <li> <p><code>questionAnswerType</code>: type (SINGLE_CHOICE or MULTI_CHOICE)</p> </li> <li> <p><code>eligibleQuestionAnswers</code>: list of possible tags / codes</p> </li> </ul> </li> <li> <p><strong><code>annotations.json</code></strong><br>Contains the complete human annotation data.<br>For each article <code>id</code>, it provides the list of responses to all 26 codebook questions as determined by an expert annotator, establishing the ground truth labels.</p> </li> </ul> <p> </p> <h2>Intended use</h2> <ul> <li> <p>Designed for research popuses including natural language understanding, content classification, and LLM evaluation.</p> </li> <li><strong>Please request access with your academic email.</strong></li> </ul>
title	Media Coding Dataset for News Content Analysis
url	https://doi.org/10.5281/zenodo.15767938

Similar Items