Saved in:
Bibliographic Details
Main Authors: Alabed, Sami, Belov, Daniel, Chrzaszcz, Bart, Franco, Juliana, Grewe, Dominik, Maclaurin, Dougal, Molloy, James, Natan, Tom, Norman, Tamara, Pan, Xiaoyue, Paszke, Adam, Rink, Norman A., Schaarschmidt, Michael, Sitdikov, Timur, Swietlik, Agnieszka, Vytiniotis, Dimitrios, Wee, Joel
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2401.11202
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918009934708736
author Alabed, Sami
Belov, Daniel
Chrzaszcz, Bart
Franco, Juliana
Grewe, Dominik
Maclaurin, Dougal
Molloy, James
Natan, Tom
Norman, Tamara
Pan, Xiaoyue
Paszke, Adam
Rink, Norman A.
Schaarschmidt, Michael
Sitdikov, Timur
Swietlik, Agnieszka
Vytiniotis, Dimitrios
Wee, Joel
author_facet Alabed, Sami
Belov, Daniel
Chrzaszcz, Bart
Franco, Juliana
Grewe, Dominik
Maclaurin, Dougal
Molloy, James
Natan, Tom
Norman, Tamara
Pan, Xiaoyue
Paszke, Adam
Rink, Norman A.
Schaarschmidt, Michael
Sitdikov, Timur
Swietlik, Agnieszka
Vytiniotis, Dimitrios
Wee, Joel
contents Training of modern large neural networks (NN) requires a combination of parallelization strategies encompassing data, model, or optimizer sharding. When strategies increase in complexity, it becomes necessary for partitioning tools to be 1) expressive, allowing the composition of simpler strategies, and 2) predictable to estimate performance analytically. We present PartIR, our design for a NN partitioning system. PartIR is focused on an incremental approach to rewriting and is hardware-and-runtime agnostic. We present a simple but powerful API for composing sharding strategies and a simulator to validate them. The process is driven by high-level programmer-issued partitioning tactics, which can be both manual and automatic. Importantly, the tactics are specified separately from the model code, making them easy to change. We evaluate PartIR on several different models to demonstrate its predictability, expressibility, and ability to reach peak performance..
format Preprint
id arxiv_https___arxiv_org_abs_2401_11202
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle PartIR: Composing SPMD Partitioning Strategies for Machine Learning
Alabed, Sami
Belov, Daniel
Chrzaszcz, Bart
Franco, Juliana
Grewe, Dominik
Maclaurin, Dougal
Molloy, James
Natan, Tom
Norman, Tamara
Pan, Xiaoyue
Paszke, Adam
Rink, Norman A.
Schaarschmidt, Michael
Sitdikov, Timur
Swietlik, Agnieszka
Vytiniotis, Dimitrios
Wee, Joel
Machine Learning
Distributed, Parallel, and Cluster Computing
Programming Languages
Training of modern large neural networks (NN) requires a combination of parallelization strategies encompassing data, model, or optimizer sharding. When strategies increase in complexity, it becomes necessary for partitioning tools to be 1) expressive, allowing the composition of simpler strategies, and 2) predictable to estimate performance analytically. We present PartIR, our design for a NN partitioning system. PartIR is focused on an incremental approach to rewriting and is hardware-and-runtime agnostic. We present a simple but powerful API for composing sharding strategies and a simulator to validate them. The process is driven by high-level programmer-issued partitioning tactics, which can be both manual and automatic. Importantly, the tactics are specified separately from the model code, making them easy to change. We evaluate PartIR on several different models to demonstrate its predictability, expressibility, and ability to reach peak performance..
title PartIR: Composing SPMD Partitioning Strategies for Machine Learning
topic Machine Learning
Distributed, Parallel, and Cluster Computing
Programming Languages
url https://arxiv.org/abs/2401.11202