Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Williams, Conor John, Elliott, James
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Distributed, Parallel, and Cluster Computing
Online-Zugang:	https://arxiv.org/abs/2402.18480
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866917600196296704
author	Williams, Conor John Elliott, James
author_facet	Williams, Conor John Elliott, James
contents	Fully-strict fork-join parallelism is a powerful model for shared-memory programming due to its optimal time scaling and strong bounds on memory scaling. The latter is rarely achieved due to the difficulty of implementing continuation stealing in traditional High Performance Computing (HPC) languages -- where it is often impossible without modifying the compiler or resorting to non-portable techniques. We demonstrate how stackless coroutines (a new feature in C++20) can enable fully-portable continuation stealing and present libfork a lock-free fine-grained parallelism library, combining coroutines with user-space, geometric segmented-stacks. We show our approach is able to achieve optimal time/memory scaling, both theoretically and empirically, across a variety of benchmarks. Compared to openMP (libomp), libfork is on average 7.2x faster and consumes 10x less memory. Similarly, compared to Intel's TBB, libfork is on average 2.7x faster and consumes 6.2x less memory. Additionally, we introduce non-uniform memory access (NUMA) optimizations for schedulers that demonstrate performance matching busy-waiting schedulers.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_18480
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Libfork: portable continuation-stealing with stackless coroutines Williams, Conor John Elliott, James Distributed, Parallel, and Cluster Computing Fully-strict fork-join parallelism is a powerful model for shared-memory programming due to its optimal time scaling and strong bounds on memory scaling. The latter is rarely achieved due to the difficulty of implementing continuation stealing in traditional High Performance Computing (HPC) languages -- where it is often impossible without modifying the compiler or resorting to non-portable techniques. We demonstrate how stackless coroutines (a new feature in C++20) can enable fully-portable continuation stealing and present libfork a lock-free fine-grained parallelism library, combining coroutines with user-space, geometric segmented-stacks. We show our approach is able to achieve optimal time/memory scaling, both theoretically and empirically, across a variety of benchmarks. Compared to openMP (libomp), libfork is on average 7.2x faster and consumes 10x less memory. Similarly, compared to Intel's TBB, libfork is on average 2.7x faster and consumes 6.2x less memory. Additionally, we introduce non-uniform memory access (NUMA) optimizations for schedulers that demonstrate performance matching busy-waiting schedulers.
title	Libfork: portable continuation-stealing with stackless coroutines
topic	Distributed, Parallel, and Cluster Computing
url	https://arxiv.org/abs/2402.18480

Ähnliche Einträge