Saved in:
Bibliographic Details
Main Authors: Coleman, Tainã, Ahmed, Hena, Shende, Ravi, Perez, Ismael, Altintaş, Ïlkay
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2506.13730
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911007595560960
author Coleman, Tainã
Ahmed, Hena
Shende, Ravi
Perez, Ismael
Altintaş, Ïlkay
author_facet Coleman, Tainã
Ahmed, Hena
Shende, Ravi
Perez, Ismael
Altintaş, Ïlkay
contents Distributed computing systems are essential for meeting the demands of modern applications, yet transitioning from single-system to distributed environments presents significant challenges. Misallocating resources in shared systems can lead to resource contention, system instability, degraded performance, priority inversion, inefficient utilization, increased latency, and environmental impact. We present BanditWare, an online recommendation system that dynamically selects the most suitable hardware for applications using a contextual multi-armed bandit algorithm. BanditWare balances exploration and exploitation, gradually refining its hardware recommendations based on observed application performance while continuing to explore potentially better options. Unlike traditional statistical and machine learning approaches that rely heavily on large historical datasets, BanditWare operates online, learning and adapting in real-time as new workloads arrive. We evaluated BanditWare on three workflow applications: Cycles (an agricultural science scientific workflow) BurnPro3D (a web-based platform for fire science) and a matrix multiplication application. Designed for seamless integration with the National Data Platform (NDP), BanditWare enables users of all experience levels to optimize resource allocation efficiently.
format Preprint
id arxiv_https___arxiv_org_abs_2506_13730
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle BanditWare: A Contextual Bandit-based Framework for Hardware Prediction
Coleman, Tainã
Ahmed, Hena
Shende, Ravi
Perez, Ismael
Altintaş, Ïlkay
Distributed, Parallel, and Cluster Computing
Artificial Intelligence
Distributed computing systems are essential for meeting the demands of modern applications, yet transitioning from single-system to distributed environments presents significant challenges. Misallocating resources in shared systems can lead to resource contention, system instability, degraded performance, priority inversion, inefficient utilization, increased latency, and environmental impact. We present BanditWare, an online recommendation system that dynamically selects the most suitable hardware for applications using a contextual multi-armed bandit algorithm. BanditWare balances exploration and exploitation, gradually refining its hardware recommendations based on observed application performance while continuing to explore potentially better options. Unlike traditional statistical and machine learning approaches that rely heavily on large historical datasets, BanditWare operates online, learning and adapting in real-time as new workloads arrive. We evaluated BanditWare on three workflow applications: Cycles (an agricultural science scientific workflow) BurnPro3D (a web-based platform for fire science) and a matrix multiplication application. Designed for seamless integration with the National Data Platform (NDP), BanditWare enables users of all experience levels to optimize resource allocation efficiently.
title BanditWare: A Contextual Bandit-based Framework for Hardware Prediction
topic Distributed, Parallel, and Cluster Computing
Artificial Intelligence
url https://arxiv.org/abs/2506.13730