Saved in:
| Main Authors: | , |
|---|---|
| Format: | Recurso digital |
| Language: | |
| Published: |
Zenodo
2025
|
| Online Access: | https://doi.org/10.5281/zenodo.16696258 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866902223556968448 |
|---|---|
| author | Halverson, Jonathan Plazonic, Josko |
| author_facet | Halverson, Jonathan Plazonic, Josko |
| contents | <p>In 2023, we introduced the <a href="https://github.com/PrincetonUniversity/jobstats">Jobstats</a> job monitoring platform which provides user-facing commands and interfaces for inspecting <span>the efficiency of Slurm jobs on CPU and GPU clusters. The platform builds on the Prometheus monitoring framework and the Grafana </span>visualization toolkit. The platform has been adopted by tens of institutions throughout the world. In this poster, we provide updates <span>on the platform, which includes the release of a new component for mitigating underutilization. <a href="https://github.com/PrincetonUniversity/job_defense_shield">Job Defense Shield</a> is a software tool </span><span>for identifying (or even automatically cancelling) user jobs that are underutilizing high-performance computing resources such as </span><span>GPUs. Users are sent automated email alerts while system administrators can view reports. Job Defense Shield is a tool for both job </span><span>monitoring and user training.</span></p> |
| format | Recurso digital |
| id | zenodo_https___doi_org_10_5281_zenodo_16696258 |
| institution | Zenodo |
| language | |
| publishDate | 2025 |
| publisher | Zenodo |
| record_format | zenodo |
| spellingShingle | Combating Underutilization with the Jobstats Job Monitoring Platform Halverson, Jonathan Plazonic, Josko <p>In 2023, we introduced the <a href="https://github.com/PrincetonUniversity/jobstats">Jobstats</a> job monitoring platform which provides user-facing commands and interfaces for inspecting <span>the efficiency of Slurm jobs on CPU and GPU clusters. The platform builds on the Prometheus monitoring framework and the Grafana </span>visualization toolkit. The platform has been adopted by tens of institutions throughout the world. In this poster, we provide updates <span>on the platform, which includes the release of a new component for mitigating underutilization. <a href="https://github.com/PrincetonUniversity/job_defense_shield">Job Defense Shield</a> is a software tool </span><span>for identifying (or even automatically cancelling) user jobs that are underutilizing high-performance computing resources such as </span><span>GPUs. Users are sent automated email alerts while system administrators can view reports. Job Defense Shield is a tool for both job </span><span>monitoring and user training.</span></p> |
| title | Combating Underutilization with the Jobstats Job Monitoring Platform |
| url | https://doi.org/10.5281/zenodo.16696258 |