Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Hull, Gavin, Bihlo, Alex
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2505.08941
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908362596155392
author	Hull, Gavin Bihlo, Alex
author_facet	Hull, Gavin Bihlo, Alex
contents	Predicting the future citation rates of academic papers is an important step toward the automation of research evaluation and the acceleration of scientific progress. We present $\textbf{ForeCite}$, a simple but powerful framework to append pre-trained causal language models with a linear head for average monthly citation rate prediction. Adapting transformers for regression tasks, ForeCite achieves a test correlation of $ρ= 0.826$ on a curated dataset of 900K+ biomedical papers published between 2000 and 2024, a 27-point improvement over the previous state-of-the-art. Comprehensive scaling-law analysis reveals consistent gains across model sizes and data volumes, while temporal holdout experiments confirm practical robustness. Gradient-based saliency heatmaps suggest a potentially undue reliance on titles and abstract texts. These results establish a new state-of-the-art in forecasting the long-term influence of academic research and lay the groundwork for the automated, high-fidelity evaluation of scientific contributions.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_08941
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers Hull, Gavin Bihlo, Alex Machine Learning Computation and Language Predicting the future citation rates of academic papers is an important step toward the automation of research evaluation and the acceleration of scientific progress. We present $\textbf{ForeCite}$, a simple but powerful framework to append pre-trained causal language models with a linear head for average monthly citation rate prediction. Adapting transformers for regression tasks, ForeCite achieves a test correlation of $ρ= 0.826$ on a curated dataset of 900K+ biomedical papers published between 2000 and 2024, a 27-point improvement over the previous state-of-the-art. Comprehensive scaling-law analysis reveals consistent gains across model sizes and data volumes, while temporal holdout experiments confirm practical robustness. Gradient-based saliency heatmaps suggest a potentially undue reliance on titles and abstract texts. These results establish a new state-of-the-art in forecasting the long-term influence of academic research and lay the groundwork for the automated, high-fidelity evaluation of scientific contributions.
title	ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers
topic	Machine Learning Computation and Language
url	https://arxiv.org/abs/2505.08941

Similar Items