Saved in:
Bibliographic Details
Main Authors: Chen, Hong, Teplitskiy, Misha, Jurgens, David
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.20581
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909659547303936
author Chen, Hong
Teplitskiy, Misha
Jurgens, David
author_facet Chen, Hong
Teplitskiy, Misha
Jurgens, David
contents Academic citations are widely used for evaluating research and tracing knowledge flows. Such uses typically rely on raw citation counts and neglect variability in citation types. In particular, citations can vary in their fidelity as original knowledge from cited studies may be paraphrased, summarized, or reinterpreted, possibly wrongly, leading to variation in how much information changes from cited to citing paper. In this study, we introduce a computational pipeline to quantify citation fidelity at scale. Using full texts of papers, the pipeline identifies citations in citing papers and the corresponding claims in cited papers, and applies supervised models to measure fidelity at the sentence level. Analyzing a large-scale multi-disciplinary dataset of approximately 13 million citation sentence pairs, we find that citation fidelity is higher when authors cite papers that are 1) more recent and intellectually close, 2) more accessible, and 3) the first author has a lower H-index and the author team is medium-sized. Using a quasi-experiment, we establish the "telephone effect" - when citing papers have low fidelity to the original claim, future papers that cite the citing paper and the original have lower fidelity to the original. Our work reveals systematic differences in citation fidelity, underscoring the limitations of analyses that rely on citation quantity alone and the potential for distortion of evidence.
format Preprint
id arxiv_https___arxiv_org_abs_2502_20581
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle The Noisy Path from Source to Citation: Measuring How Scholars Engage with Past Research
Chen, Hong
Teplitskiy, Misha
Jurgens, David
Computation and Language
Academic citations are widely used for evaluating research and tracing knowledge flows. Such uses typically rely on raw citation counts and neglect variability in citation types. In particular, citations can vary in their fidelity as original knowledge from cited studies may be paraphrased, summarized, or reinterpreted, possibly wrongly, leading to variation in how much information changes from cited to citing paper. In this study, we introduce a computational pipeline to quantify citation fidelity at scale. Using full texts of papers, the pipeline identifies citations in citing papers and the corresponding claims in cited papers, and applies supervised models to measure fidelity at the sentence level. Analyzing a large-scale multi-disciplinary dataset of approximately 13 million citation sentence pairs, we find that citation fidelity is higher when authors cite papers that are 1) more recent and intellectually close, 2) more accessible, and 3) the first author has a lower H-index and the author team is medium-sized. Using a quasi-experiment, we establish the "telephone effect" - when citing papers have low fidelity to the original claim, future papers that cite the citing paper and the original have lower fidelity to the original. Our work reveals systematic differences in citation fidelity, underscoring the limitations of analyses that rely on citation quantity alone and the potential for distortion of evidence.
title The Noisy Path from Source to Citation: Measuring How Scholars Engage with Past Research
topic Computation and Language
url https://arxiv.org/abs/2502.20581