Saved in:
Bibliographic Details
Main Authors: Yang, Rui, Tong, Jiayi, Wang, Haoyuan, Huang, Hui, Hu, Ziyang, Li, Peiyu, Liu, Nan, Lindsell, Christopher J., Pencina, Michael J., Chen, Yong, Hong, Chuan
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.13857
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908445356064768
author Yang, Rui
Tong, Jiayi
Wang, Haoyuan
Huang, Hui
Hu, Ziyang
Li, Peiyu
Liu, Nan
Lindsell, Christopher J.
Pencina, Michael J.
Chen, Yong
Hong, Chuan
author_facet Yang, Rui
Tong, Jiayi
Wang, Haoyuan
Huang, Hui
Hu, Ziyang
Li, Peiyu
Liu, Nan
Lindsell, Christopher J.
Pencina, Michael J.
Chen, Yong
Hong, Chuan
contents Background. Systematic reviews in comparative effectiveness research require timely evidence synthesis. Preprints accelerate knowledge dissemination but vary in quality, posing challenges for systematic reviews. Methods. We propose AutoConfidence (automated confidence assessment), an advanced framework for predicting preprint publication, which reduces reliance on manual curation and expands the range of predictors, including three key advancements: (1) automated data extraction using natural language processing techniques, (2) semantic embeddings of titles and abstracts, and (3) large language model (LLM)-driven evaluation scores. Additionally, we employed two prediction models: a random forest classifier for binary outcome and a survival cure model that predicts both binary outcome and publication risk over time. Results. The random forest classifier achieved AUROC 0.692 with LLM-driven scores, improving to 0.733 with semantic embeddings and 0.747 with article usage metrics. The survival cure model reached AUROC 0.716 with LLM-driven scores, improving to 0.731 with semantic embeddings. For publication risk prediction, it achieved a concordance index of 0.658, increasing to 0.667 with semantic embeddings. Conclusion. Our study advances the framework for preprint publication prediction through automated data extraction and multiple feature integration. By combining semantic embeddings with LLM-driven evaluations, AutoConfidence enhances predictive performance while reducing manual annotation burden. The framework has the potential to facilitate incorporation of preprint articles during the appraisal phase of systematic reviews, supporting researchers in more effective utilization of preprint resources.
format Preprint
id arxiv_https___arxiv_org_abs_2503_13857
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Enabling Inclusive Systematic Reviews: Incorporating Preprint Articles with Large Language Model-Driven Evaluations
Yang, Rui
Tong, Jiayi
Wang, Haoyuan
Huang, Hui
Hu, Ziyang
Li, Peiyu
Liu, Nan
Lindsell, Christopher J.
Pencina, Michael J.
Chen, Yong
Hong, Chuan
Computation and Language
Background. Systematic reviews in comparative effectiveness research require timely evidence synthesis. Preprints accelerate knowledge dissemination but vary in quality, posing challenges for systematic reviews. Methods. We propose AutoConfidence (automated confidence assessment), an advanced framework for predicting preprint publication, which reduces reliance on manual curation and expands the range of predictors, including three key advancements: (1) automated data extraction using natural language processing techniques, (2) semantic embeddings of titles and abstracts, and (3) large language model (LLM)-driven evaluation scores. Additionally, we employed two prediction models: a random forest classifier for binary outcome and a survival cure model that predicts both binary outcome and publication risk over time. Results. The random forest classifier achieved AUROC 0.692 with LLM-driven scores, improving to 0.733 with semantic embeddings and 0.747 with article usage metrics. The survival cure model reached AUROC 0.716 with LLM-driven scores, improving to 0.731 with semantic embeddings. For publication risk prediction, it achieved a concordance index of 0.658, increasing to 0.667 with semantic embeddings. Conclusion. Our study advances the framework for preprint publication prediction through automated data extraction and multiple feature integration. By combining semantic embeddings with LLM-driven evaluations, AutoConfidence enhances predictive performance while reducing manual annotation burden. The framework has the potential to facilitate incorporation of preprint articles during the appraisal phase of systematic reviews, supporting researchers in more effective utilization of preprint resources.
title Enabling Inclusive Systematic Reviews: Incorporating Preprint Articles with Large Language Model-Driven Evaluations
topic Computation and Language
url https://arxiv.org/abs/2503.13857