Guardado en:
Detalles Bibliográficos
Autores principales: Magoulès, Frédéric, Gbikpi-Benissan, Guillaume
Formato: Preprint
Publicado: 2023
Materias:
Acceso en línea:https://arxiv.org/abs/2312.17558
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866916076512608256
author Magoulès, Frédéric
Gbikpi-Benissan, Guillaume
author_facet Magoulès, Frédéric
Gbikpi-Benissan, Guillaume
contents Convergence of classical parallel iterations is detected by performing a reduction operation at each iteration in order to compute a residual error relative to a potential solution vector. To efficiently run asynchronous iterations, blocking communication requests are avoided, which makes it hard to isolate and handle any global vector. While some termination protocols were proposed for asynchronous iterations, only very few of them are based on global residual computation and guarantee effective convergence. But the most effective and efficient existing solutions feature two reduction operations, which constitutes an important factor of termination delay. In this paper, we present new, non-intrusive, protocols to compute a residual error under asynchronous iterations, requiring only one reduction operation. Various communication models show that some heuristics can even be introduced and formally evaluated. Extensive experiments with up to 5600 processor cores confirm the practical effectiveness and efficiency of our approach.
format Preprint
id arxiv_https___arxiv_org_abs_2312_17558
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Distributed convergence detection based on global residual error under asynchronous iterations
Magoulès, Frédéric
Gbikpi-Benissan, Guillaume
Distributed, Parallel, and Cluster Computing
Convergence of classical parallel iterations is detected by performing a reduction operation at each iteration in order to compute a residual error relative to a potential solution vector. To efficiently run asynchronous iterations, blocking communication requests are avoided, which makes it hard to isolate and handle any global vector. While some termination protocols were proposed for asynchronous iterations, only very few of them are based on global residual computation and guarantee effective convergence. But the most effective and efficient existing solutions feature two reduction operations, which constitutes an important factor of termination delay. In this paper, we present new, non-intrusive, protocols to compute a residual error under asynchronous iterations, requiring only one reduction operation. Various communication models show that some heuristics can even be introduced and formally evaluated. Extensive experiments with up to 5600 processor cores confirm the practical effectiveness and efficiency of our approach.
title Distributed convergence detection based on global residual error under asynchronous iterations
topic Distributed, Parallel, and Cluster Computing
url https://arxiv.org/abs/2312.17558