Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Recurso digital |
| Language: | |
| Published: |
Zenodo
2025
|
| Online Access: | https://doi.org/10.5281/zenodo.15767938 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866902258402197504 |
|---|---|
| author | Doropoulos, Stavros Karapalidou, Elisavet Charitidis, Polychronis Karakeva, Sophia Vologiannidis, Stavros |
| author_facet | Doropoulos, Stavros Karapalidou, Elisavet Charitidis, Polychronis Karakeva, Sophia Vologiannidis, Stavros |
| contents | <p>This dataset accompanies the study Beyond Manual Media Coding: Evaluating Large Language Models and Agents for News Content Analysis.</p> <p>It provides a reproducible benchmark for evaluating automated content analysis methods against human-annotated ground truth.</p> <p>The dataset includes:</p> <ul> <li> <p><strong><code>articles.csv</code></strong><br>Contains the 200 news articles collected for this study, each with:</p> <ul> <li> <p><code>id</code>: unique identifier</p> </li> <li> <p><code>url</code>: source URL of the original article</p> </li> <li> <p><code>content</code>: full text of the news article</p> </li> </ul> </li> <li> <p><strong><code>codebook.json</code></strong><br>A structured JSON file defining the 26-question analysis codebook used for annotation.<br>Each question entry specifies:</p> <ul> <li> <p><code>questionId</code>: question ID (e.g., Q1)</p> </li> <li> <p><code>prompt</code>: annotation question text</p> </li> <li> <p><code>questionAnswerType</code>: type (SINGLE_CHOICE or MULTI_CHOICE)</p> </li> <li> <p><code>eligibleQuestionAnswers</code>: list of possible tags / codes</p> </li> </ul> </li> <li> <p><strong><code>annotations.json</code></strong><br>Contains the complete human annotation data.<br>For each article <code>id</code>, it provides the list of responses to all 26 codebook questions as determined by an expert annotator, establishing the ground truth labels.</p> </li> </ul> <p> </p> <h2>Intended use</h2> <ul> <li> <p>Designed for research popuses including natural language understanding, content classification, and LLM evaluation.</p> </li> <li><strong>Please request access with your academic email.</strong></li> </ul> |
| format | Recurso digital |
| id | zenodo_https___doi_org_10_5281_zenodo_15767938 |
| institution | Zenodo |
| language | |
| publishDate | 2025 |
| publisher | Zenodo |
| record_format | zenodo |
| spellingShingle | Media Coding Dataset for News Content Analysis Doropoulos, Stavros Karapalidou, Elisavet Charitidis, Polychronis Karakeva, Sophia Vologiannidis, Stavros <p>This dataset accompanies the study Beyond Manual Media Coding: Evaluating Large Language Models and Agents for News Content Analysis.</p> <p>It provides a reproducible benchmark for evaluating automated content analysis methods against human-annotated ground truth.</p> <p>The dataset includes:</p> <ul> <li> <p><strong><code>articles.csv</code></strong><br>Contains the 200 news articles collected for this study, each with:</p> <ul> <li> <p><code>id</code>: unique identifier</p> </li> <li> <p><code>url</code>: source URL of the original article</p> </li> <li> <p><code>content</code>: full text of the news article</p> </li> </ul> </li> <li> <p><strong><code>codebook.json</code></strong><br>A structured JSON file defining the 26-question analysis codebook used for annotation.<br>Each question entry specifies:</p> <ul> <li> <p><code>questionId</code>: question ID (e.g., Q1)</p> </li> <li> <p><code>prompt</code>: annotation question text</p> </li> <li> <p><code>questionAnswerType</code>: type (SINGLE_CHOICE or MULTI_CHOICE)</p> </li> <li> <p><code>eligibleQuestionAnswers</code>: list of possible tags / codes</p> </li> </ul> </li> <li> <p><strong><code>annotations.json</code></strong><br>Contains the complete human annotation data.<br>For each article <code>id</code>, it provides the list of responses to all 26 codebook questions as determined by an expert annotator, establishing the ground truth labels.</p> </li> </ul> <p> </p> <h2>Intended use</h2> <ul> <li> <p>Designed for research popuses including natural language understanding, content classification, and LLM evaluation.</p> </li> <li><strong>Please request access with your academic email.</strong></li> </ul> |
| title | Media Coding Dataset for News Content Analysis |
| url | https://doi.org/10.5281/zenodo.15767938 |