Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Barley, Daniel, Leis, Jonathan, Klenk, Benjamin, Fröning, Holger
Format:	Preprint
Published:	2026
Subjects:	Distributed, Parallel, and Cluster Computing Machine Learning
Online Access:	https://arxiv.org/abs/2605.24006
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916040622997504
author	Barley, Daniel Leis, Jonathan Klenk, Benjamin Fröning, Holger
author_facet	Barley, Daniel Leis, Jonathan Klenk, Benjamin Fröning, Holger
contents	Pipeline parallelism is a key technique for distributed training of large language models because it reduces per-device parameter and activation memory. However, comparing pipeline schedules is difficult: analytical models expose structural quantities such as bubble ratios, while end-to-end hardware experiments are costly and system-specific. In this work, we introduce a tabular schedule abstraction and a unified multi-abstraction methodology that connects formula-based reasoning, idealized schedule tables, and communication-aware execution simulation. Using this framework, we compare GPipe, 1F1B, Chimera, and Hanayo in its restricted regime across multiple modeled system configurations. Our results show that schedule rankings are not abstraction-invariant: communication can negate structural advantages suggested by bubble analysis alone. Under the assumptions considered here, GPipe and 1F1B are runtime-equivalent, but 1F1B achieves a lower activation-memory peak. Chimera is advantageous mainly at low microbatch counts and in communication-favorable regimes, while Hanayo is effective in its intended restricted operating point but remains sensitive to network bottlenecks. We further study an asymmetric Chimera-style placement, which does not reduce the global peak memory requirement but reveals limited runtime gains in shallow pipelines. Overall, pipeline schedule quality is meaningful only in the context of the modeled execution environment.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_24006
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	A Tabular Schedule Abstraction for Communication-Aware Evaluation of Pipeline-Parallel LLM Training Barley, Daniel Leis, Jonathan Klenk, Benjamin Fröning, Holger Distributed, Parallel, and Cluster Computing Machine Learning Pipeline parallelism is a key technique for distributed training of large language models because it reduces per-device parameter and activation memory. However, comparing pipeline schedules is difficult: analytical models expose structural quantities such as bubble ratios, while end-to-end hardware experiments are costly and system-specific. In this work, we introduce a tabular schedule abstraction and a unified multi-abstraction methodology that connects formula-based reasoning, idealized schedule tables, and communication-aware execution simulation. Using this framework, we compare GPipe, 1F1B, Chimera, and Hanayo in its restricted regime across multiple modeled system configurations. Our results show that schedule rankings are not abstraction-invariant: communication can negate structural advantages suggested by bubble analysis alone. Under the assumptions considered here, GPipe and 1F1B are runtime-equivalent, but 1F1B achieves a lower activation-memory peak. Chimera is advantageous mainly at low microbatch counts and in communication-favorable regimes, while Hanayo is effective in its intended restricted operating point but remains sensitive to network bottlenecks. We further study an asymmetric Chimera-style placement, which does not reduce the global peak memory requirement but reveals limited runtime gains in shallow pipelines. Overall, pipeline schedule quality is meaningful only in the context of the modeled execution environment.
title	A Tabular Schedule Abstraction for Communication-Aware Evaluation of Pipeline-Parallel LLM Training
topic	Distributed, Parallel, and Cluster Computing Machine Learning
url	https://arxiv.org/abs/2605.24006

Similar Items