Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Schmid, Larissa, Horzela, Maximilian, Zhyla, Valerii, Giffels, Manuel, Quast, Günter, Koziolek, Anne
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing Performance High Energy Physics - Experiment
Online Access:	https://arxiv.org/abs/2502.12741
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914088596013056
author	Schmid, Larissa Horzela, Maximilian Zhyla, Valerii Giffels, Manuel Quast, Günter Koziolek, Anne
author_facet	Schmid, Larissa Horzela, Maximilian Zhyla, Valerii Giffels, Manuel Quast, Günter Koziolek, Anne
contents	The Worldwide LHC Computing Grid (WLCG) provides the robust computing infrastructure essential for the LHC experiments by integrating global computing resources into a cohesive entity. Simulations of different compute models present a feasible approach for evaluating future adaptations that are able to cope with future increased demands. However, running these simulations incurs a trade-off between accuracy and scalability. For example, while the simulator DCSim can provide accurate results, it falls short on scaling with the size of the simulated platform. Using Generative Machine Learning as a surrogate presents a candidate for overcoming this challenge. In this work, we evaluate the usage of three different Machine Learning models for the simulation of distributed computing systems and assess their ability to generalize to unseen situations. We show that those models can predict central observables derived from execution traces of compute jobs with approximate accuracy but with orders of magnitude faster execution times. Furthermore, we identify potentials for improving the predictions towards better accuracy and generalizability.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_12741
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Surrogate Modeling for Scalable Evaluation of Distributed Computing Systems for HEP Applications Schmid, Larissa Horzela, Maximilian Zhyla, Valerii Giffels, Manuel Quast, Günter Koziolek, Anne Distributed, Parallel, and Cluster Computing Performance High Energy Physics - Experiment The Worldwide LHC Computing Grid (WLCG) provides the robust computing infrastructure essential for the LHC experiments by integrating global computing resources into a cohesive entity. Simulations of different compute models present a feasible approach for evaluating future adaptations that are able to cope with future increased demands. However, running these simulations incurs a trade-off between accuracy and scalability. For example, while the simulator DCSim can provide accurate results, it falls short on scaling with the size of the simulated platform. Using Generative Machine Learning as a surrogate presents a candidate for overcoming this challenge. In this work, we evaluate the usage of three different Machine Learning models for the simulation of distributed computing systems and assess their ability to generalize to unseen situations. We show that those models can predict central observables derived from execution traces of compute jobs with approximate accuracy but with orders of magnitude faster execution times. Furthermore, we identify potentials for improving the predictions towards better accuracy and generalizability.
title	Surrogate Modeling for Scalable Evaluation of Distributed Computing Systems for HEP Applications
topic	Distributed, Parallel, and Cluster Computing Performance High Energy Physics - Experiment
url	https://arxiv.org/abs/2502.12741

Similar Items