Saved in:
Bibliographic Details
Main Author: Simakov, nikolay
Format: Recurso digital
Language:
Published: Zenodo 2024
Online Access:https://doi.org/10.5281/zenodo.15724847
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866901165111771136
author Simakov, nikolay
author_facet Simakov, nikolay
contents <p>High-performance computing (HPC) resources are critical for scientific and engineering computations, but they come with significant costs, making optimal utilization essential. While substantial effort is often invested in performance optimization during the initial deployment, regular performance monitoring tends to diminish during the production phase, potentially leading to undetected degradation in software and hardware performance. The XDMoD Application Kernel Performance Monitoring Module addresses this by automatically executing a suite of applications and benchmarks on a daily basis to identify performance degradation proactively. This module is designed to generate meaningful performance data while keeping short wall times to minimize impact on users. It also includes a performance degradation detection algorithm that can trigger regular or problem-specific email notifications. Operational since 2011, this module has successfully monitored the performance of XSEDE and ACCESS resources. This presentation provides an overview of the module’s capabilities and showcases practical use cases illustrating its effectiveness.</p>
format Recurso digital
id zenodo_https___doi_org_10_5281_zenodo_15724847
institution Zenodo
language
publishDate 2024
publisher Zenodo
record_format zenodo
spellingShingle Benchmarking and Continuous Performance Monitoring of HPC Resources using the XDMoD Application Kernel Module
Simakov, nikolay
<p>High-performance computing (HPC) resources are critical for scientific and engineering computations, but they come with significant costs, making optimal utilization essential. While substantial effort is often invested in performance optimization during the initial deployment, regular performance monitoring tends to diminish during the production phase, potentially leading to undetected degradation in software and hardware performance. The XDMoD Application Kernel Performance Monitoring Module addresses this by automatically executing a suite of applications and benchmarks on a daily basis to identify performance degradation proactively. This module is designed to generate meaningful performance data while keeping short wall times to minimize impact on users. It also includes a performance degradation detection algorithm that can trigger regular or problem-specific email notifications. Operational since 2011, this module has successfully monitored the performance of XSEDE and ACCESS resources. This presentation provides an overview of the module’s capabilities and showcases practical use cases illustrating its effectiveness.</p>
title Benchmarking and Continuous Performance Monitoring of HPC Resources using the XDMoD Application Kernel Module
url https://doi.org/10.5281/zenodo.15724847