Saved in:
Bibliographic Details
Main Author: Ghribi, Adnan
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.09376
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915548366897152
author Ghribi, Adnan
author_facet Ghribi, Adnan
contents We present an open-source pipeline for generating a \emph{living review} of artificial intelligence (AI) and machine learning (ML) applications in accelerator physics and technologies. Traditional review articles provide static snapshots that are quickly outdated by the rapid pace of research. The presented system automatically harvests publications from multiple bibliographic sources (arXiv, InspireHEP, HAL, OpenAlex, Crossref, and Springer), deduplicates entries, applies semantic filtering to ensure accelerator and ML relevance, and classifies papers into thematic categories. The resulting curated dataset was exported in JSON, HTML, PDF, and Bib\TeX formats, enabling continuous updates and integration with web frameworks. We describe the methodology, including semantic similarity filtering using sentence-transformer embeddings, threshold calibration, and expert-informed classification. The results demonstrate the robust filtering of $\sim$12000 raw papers/month into a focused corpus of $\sim$2\% relevant works. The pipeline provides the basis for an evolving community-driven review of AI/ML in accelerator science.
format Preprint
id arxiv_https___arxiv_org_abs_2510_09376
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle A Living Review Pipeline for AI/ML Applications in Accelerator Physics
Ghribi, Adnan
Accelerator Physics
We present an open-source pipeline for generating a \emph{living review} of artificial intelligence (AI) and machine learning (ML) applications in accelerator physics and technologies. Traditional review articles provide static snapshots that are quickly outdated by the rapid pace of research. The presented system automatically harvests publications from multiple bibliographic sources (arXiv, InspireHEP, HAL, OpenAlex, Crossref, and Springer), deduplicates entries, applies semantic filtering to ensure accelerator and ML relevance, and classifies papers into thematic categories. The resulting curated dataset was exported in JSON, HTML, PDF, and Bib\TeX formats, enabling continuous updates and integration with web frameworks. We describe the methodology, including semantic similarity filtering using sentence-transformer embeddings, threshold calibration, and expert-informed classification. The results demonstrate the robust filtering of $\sim$12000 raw papers/month into a focused corpus of $\sim$2\% relevant works. The pipeline provides the basis for an evolving community-driven review of AI/ML in accelerator science.
title A Living Review Pipeline for AI/ML Applications in Accelerator Physics
topic Accelerator Physics
url https://arxiv.org/abs/2510.09376