Enregistré dans:
| Auteurs principaux: | , , , , |
|---|---|
| Format: | Preprint |
| Publié: |
2025
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2511.01573 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866918183642857472 |
|---|---|
| author | Tonarelli, Melanie Riva, Simone Benedusi, Pietro Ferrandi, Fabrizio Krause, Rolf |
| author_facet | Tonarelli, Melanie Riva, Simone Benedusi, Pietro Ferrandi, Fabrizio Krause, Rolf |
| contents | We introduce a distributed adaptive quadrature method that formulates multidimensional integration as a hierarchical domain decomposition problem on multi-GPU architectures. The integration domain is recursively partitioned into subdomains whose refinement is guided by local error estimators. Each subdomain evolves independently on a GPU, which exposes a significant load imbalance as the adaptive process progresses. To address this challenge, we introduce a decentralised load redistribution schemes based on a cyclic round-robin policy. This strategy dynamically rebalance subdomains across devices through non-blocking, CUDA-aware MPI communication that overlaps with computation. The proposed strategy has two main advantages compared to a state-of-the-art GPU-tailored package: higher efficiency in high dimensions; and improved robustness w.r.t the integrand regularity and the target accuracy. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2511_01573 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Adaptive Multidimensional Quadrature on Multi-GPU Systems Tonarelli, Melanie Riva, Simone Benedusi, Pietro Ferrandi, Fabrizio Krause, Rolf Distributed, Parallel, and Cluster Computing We introduce a distributed adaptive quadrature method that formulates multidimensional integration as a hierarchical domain decomposition problem on multi-GPU architectures. The integration domain is recursively partitioned into subdomains whose refinement is guided by local error estimators. Each subdomain evolves independently on a GPU, which exposes a significant load imbalance as the adaptive process progresses. To address this challenge, we introduce a decentralised load redistribution schemes based on a cyclic round-robin policy. This strategy dynamically rebalance subdomains across devices through non-blocking, CUDA-aware MPI communication that overlaps with computation. The proposed strategy has two main advantages compared to a state-of-the-art GPU-tailored package: higher efficiency in high dimensions; and improved robustness w.r.t the integrand regularity and the target accuracy. |
| title | Adaptive Multidimensional Quadrature on Multi-GPU Systems |
| topic | Distributed, Parallel, and Cluster Computing |
| url | https://arxiv.org/abs/2511.01573 |