Enregistré dans:
Détails bibliographiques
Auteurs principaux: Bron, Michiel P., van der Heijden, Peter G. M., Feelders, Ad J., Siebes, Arno P. J. M.
Format: Preprint
Publié: 2024
Sujets:
Accès en ligne:https://arxiv.org/abs/2404.01176
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866914737253515264
author Bron, Michiel P.
van der Heijden, Peter G. M.
Feelders, Ad J.
Siebes, Arno P. J. M.
author_facet Bron, Michiel P.
van der Heijden, Peter G. M.
Feelders, Ad J.
Siebes, Arno P. J. M.
contents Technology-Assisted Review (TAR) aims to reduce the human effort required for screening processes such as abstract screening for systematic literature reviews. Human reviewers label documents as relevant or irrelevant during this process, while the system incrementally updates a prediction model based on the reviewers' previous decisions. After each model update, the system proposes new documents it deems relevant, to prioritize relevant documentsover irrelevant ones. A stopping criterion is necessary to guide users in stopping the review process to minimize the number of missed relevant documents and the number of read irrelevant documents. In this paper, we propose and evaluate a new ensemble-based Active Learning strategy and a stopping criterion based on Chao's Population Size Estimator that estimates the prevalence of relevant documents in the dataset. Our simulation study demonstrates that this criterion performs well on several datasets and is compared to other methods presented in the literature.
format Preprint
id arxiv_https___arxiv_org_abs_2404_01176
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Using Chao's Estimator as a Stopping Criterion for Technology-Assisted Review
Bron, Michiel P.
van der Heijden, Peter G. M.
Feelders, Ad J.
Siebes, Arno P. J. M.
Information Retrieval
Technology-Assisted Review (TAR) aims to reduce the human effort required for screening processes such as abstract screening for systematic literature reviews. Human reviewers label documents as relevant or irrelevant during this process, while the system incrementally updates a prediction model based on the reviewers' previous decisions. After each model update, the system proposes new documents it deems relevant, to prioritize relevant documentsover irrelevant ones. A stopping criterion is necessary to guide users in stopping the review process to minimize the number of missed relevant documents and the number of read irrelevant documents. In this paper, we propose and evaluate a new ensemble-based Active Learning strategy and a stopping criterion based on Chao's Population Size Estimator that estimates the prevalence of relevant documents in the dataset. Our simulation study demonstrates that this criterion performs well on several datasets and is compared to other methods presented in the literature.
title Using Chao's Estimator as a Stopping Criterion for Technology-Assisted Review
topic Information Retrieval
url https://arxiv.org/abs/2404.01176