Salvato in:
Dettagli Bibliografici
Autori principali: Delorme, Stéphane, Mach, Leon, Paszkiewicz, Hubert, Ruiz, Richard
Natura: Preprint
Pubblicazione: 2025
Soggetti:
Accesso online:https://arxiv.org/abs/2509.09601
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866908533336834048
author Delorme, Stéphane
Mach, Leon
Paszkiewicz, Hubert
Ruiz, Richard
author_facet Delorme, Stéphane
Mach, Leon
Paszkiewicz, Hubert
Ruiz, Richard
contents Extracting information from big data sets, both real and simulated, is a modern hallmark of the physical sciences. In practice, students face barriers to learning ``Big Data'' methods in undergraduate physics and astronomy curricula. As an attempt to alleviate some of these challenges, we present a simple, farm-to-table data analysis pipeline that can collect, process, and plot data from the 800k entries common to the arXiv preprint repository and the bibliographical database inSpireHEP. The pipeline employs contemporary research practices and can be implemented using open-sourced Python libraries common to undergraduate courses on Scientific Computing. To support the use such pipelines in classroom contexts, we make public an example implementation, authored by two undergraduate physics students, that runs on off-the-shelf laptops. For advanced students, we discuss applications of the pipeline, including for online DAQ monitoring and commercialization.
format Preprint
id arxiv_https___arxiv_org_abs_2509_09601
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Are arXiv submissions on Wednesday better cited? Introducing Big Data methods in undergraduate courses on scientific computing
Delorme, Stéphane
Mach, Leon
Paszkiewicz, Hubert
Ruiz, Richard
Physics Education
High Energy Physics - Experiment
Computational Physics
Extracting information from big data sets, both real and simulated, is a modern hallmark of the physical sciences. In practice, students face barriers to learning ``Big Data'' methods in undergraduate physics and astronomy curricula. As an attempt to alleviate some of these challenges, we present a simple, farm-to-table data analysis pipeline that can collect, process, and plot data from the 800k entries common to the arXiv preprint repository and the bibliographical database inSpireHEP. The pipeline employs contemporary research practices and can be implemented using open-sourced Python libraries common to undergraduate courses on Scientific Computing. To support the use such pipelines in classroom contexts, we make public an example implementation, authored by two undergraduate physics students, that runs on off-the-shelf laptops. For advanced students, we discuss applications of the pipeline, including for online DAQ monitoring and commercialization.
title Are arXiv submissions on Wednesday better cited? Introducing Big Data methods in undergraduate courses on scientific computing
topic Physics Education
High Energy Physics - Experiment
Computational Physics
url https://arxiv.org/abs/2509.09601