Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Dao, An, Tran, Vu, Nguyen, Le-Minh, Matsumoto, Yuji
Format:	Preprint
Published:	2025
Subjects:	Digital Libraries Computation and Language
Online Access:	https://arxiv.org/abs/2509.24283
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

We present an overview of the SCIDOCA 2025 Shared Task, which focuses on citation discovery and prediction in scientific documents. The task is divided into three subtasks: (1) Citation Discovery, where systems must identify relevant references for a given paragraph; (2) Masked Citation Prediction, which requires selecting the correct citation for masked citation slots; and (3) Citation Sentence Prediction, where systems must determine the correct reference for each cited sentence. We release a large-scale dataset constructed from the Semantic Scholar Open Research Corpus (S2ORC), containing over 60,000 annotated paragraphs and a curated reference set. The test set consists of 1,000 paragraphs from distinct papers, each annotated with ground-truth citations and distractor candidates. A total of seven teams registered, with three submitting results. We report performance metrics across all subtasks and analyze the effectiveness of submitted systems. This shared task provides a new benchmark for evaluating citation modeling and encourages future research in scientific document understanding. The dataset and task materials are publicly available at https://github.com/daotuanan/scidoca2025-shared-task.

Similar Items