Saved in:
Bibliographic Details
Main Authors: Halverson, Jonathan, Plazonic, Josko
Format: Recurso digital
Language:
Published: Zenodo 2025
Online Access:https://doi.org/10.5281/zenodo.16696258
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866902223556968448
author Halverson, Jonathan
Plazonic, Josko
author_facet Halverson, Jonathan
Plazonic, Josko
contents <p>In 2023, we introduced the <a href="https://github.com/PrincetonUniversity/jobstats">Jobstats</a> job monitoring platform which provides user-facing commands and interfaces for inspecting <span>the efficiency of Slurm jobs on CPU and GPU clusters. The platform builds on the Prometheus monitoring framework and the Grafana </span>visualization toolkit. The platform has been adopted by tens of institutions throughout the world. In this poster, we provide updates <span>on the platform, which includes the release of a new component for mitigating underutilization. <a href="https://github.com/PrincetonUniversity/job_defense_shield">Job Defense Shield</a> is a software tool </span><span>for identifying (or even automatically cancelling) user jobs that are underutilizing high-performance computing resources such as </span><span>GPUs. Users are sent automated email alerts while system administrators can view reports. Job Defense Shield is a tool for both job </span><span>monitoring and user training.</span></p>
format Recurso digital
id zenodo_https___doi_org_10_5281_zenodo_16696258
institution Zenodo
language
publishDate 2025
publisher Zenodo
record_format zenodo
spellingShingle Combating Underutilization with the Jobstats Job Monitoring Platform
Halverson, Jonathan
Plazonic, Josko
<p>In 2023, we introduced the <a href="https://github.com/PrincetonUniversity/jobstats">Jobstats</a> job monitoring platform which provides user-facing commands and interfaces for inspecting <span>the efficiency of Slurm jobs on CPU and GPU clusters. The platform builds on the Prometheus monitoring framework and the Grafana </span>visualization toolkit. The platform has been adopted by tens of institutions throughout the world. In this poster, we provide updates <span>on the platform, which includes the release of a new component for mitigating underutilization. <a href="https://github.com/PrincetonUniversity/job_defense_shield">Job Defense Shield</a> is a software tool </span><span>for identifying (or even automatically cancelling) user jobs that are underutilizing high-performance computing resources such as </span><span>GPUs. Users are sent automated email alerts while system administrators can view reports. Job Defense Shield is a tool for both job </span><span>monitoring and user training.</span></p>
title Combating Underutilization with the Jobstats Job Monitoring Platform
url https://doi.org/10.5281/zenodo.16696258