Guardat en:
Dades bibliogràfiques
Autors principals: Bohm, Ada, Baranek, Jakub, Garcia, Alberto
Format: Recurso digital
Idioma:anglès
Publicat: Zenodo 2024
Matèries:
Accés en línia:https://doi.org/10.5281/zenodo.19662358
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!
Taula de continguts:
  • <p>This document is about HyperQueue, an advanced meta-scheduling system designed for high-performance computing (HPC) environments. It describes HyperQueue’s innovative approach to task scheduling, which enables efficient execution of complex scientific workflows by dynamically managing computational resources. The document explores the system’s key features, including low-overhead task processing, automatic allocation management, output streaming, and emerging data transfer capabilities. It details how HyperQueue addresses challenges in HPC task scheduling by providing flexible resource allocation, supporting producer-consumer workflows, and offering resilience mechanisms. The text also highlights the system’s current capabilities, its integration with frameworks like AiiDA, and outlines ongoing developments to improve multi-node task handling and data communication in large-scale computational pipelines.<br>The document describes these particular features of HyperQueue and its integration, with particular emphasis on the following key features and integrations.</p> <p><br>Key Features and Capabilities:<br>• Efficient handling of millions of tasks with minimal overhead (~0.1ms per task)<br>• Dynamic resource allocation between multiple compute nodes<br>• Automated management of cluster allocations<br>• Built-in task output streaming to reduce filesystem load<br>• Server resilience through journaling for crash recovery<br>• Support for multi-node tasks and MPI applications<br>• Dynamic task submission through "open jobs" feature<br>• Peer-to-peer data exchange between tasks (in development)</p> <p><br>Applications and Integration:<br>• Successfully integrated with the AiiDA framework for high-throughput scientific<br>workflows<br>• Particularly effective for producer-consumer workflows and non-static DAG computations<br>• Supports complex resource specifications for both tasks and nodes<br>• Enables efficient handling of data dependencies in distributed computations</p>