Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.11202 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866918009934708736 |
|---|---|
| author | Alabed, Sami Belov, Daniel Chrzaszcz, Bart Franco, Juliana Grewe, Dominik Maclaurin, Dougal Molloy, James Natan, Tom Norman, Tamara Pan, Xiaoyue Paszke, Adam Rink, Norman A. Schaarschmidt, Michael Sitdikov, Timur Swietlik, Agnieszka Vytiniotis, Dimitrios Wee, Joel |
| author_facet | Alabed, Sami Belov, Daniel Chrzaszcz, Bart Franco, Juliana Grewe, Dominik Maclaurin, Dougal Molloy, James Natan, Tom Norman, Tamara Pan, Xiaoyue Paszke, Adam Rink, Norman A. Schaarschmidt, Michael Sitdikov, Timur Swietlik, Agnieszka Vytiniotis, Dimitrios Wee, Joel |
| contents | Training of modern large neural networks (NN) requires a combination of parallelization strategies encompassing data, model, or optimizer sharding. When strategies increase in complexity, it becomes necessary for partitioning tools to be 1) expressive, allowing the composition of simpler strategies, and 2) predictable to estimate performance analytically. We present PartIR, our design for a NN partitioning system. PartIR is focused on an incremental approach to rewriting and is hardware-and-runtime agnostic. We present a simple but powerful API for composing sharding strategies and a simulator to validate them. The process is driven by high-level programmer-issued partitioning tactics, which can be both manual and automatic. Importantly, the tactics are specified separately from the model code, making them easy to change. We evaluate PartIR on several different models to demonstrate its predictability, expressibility, and ability to reach peak performance.. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2401_11202 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | PartIR: Composing SPMD Partitioning Strategies for Machine Learning Alabed, Sami Belov, Daniel Chrzaszcz, Bart Franco, Juliana Grewe, Dominik Maclaurin, Dougal Molloy, James Natan, Tom Norman, Tamara Pan, Xiaoyue Paszke, Adam Rink, Norman A. Schaarschmidt, Michael Sitdikov, Timur Swietlik, Agnieszka Vytiniotis, Dimitrios Wee, Joel Machine Learning Distributed, Parallel, and Cluster Computing Programming Languages Training of modern large neural networks (NN) requires a combination of parallelization strategies encompassing data, model, or optimizer sharding. When strategies increase in complexity, it becomes necessary for partitioning tools to be 1) expressive, allowing the composition of simpler strategies, and 2) predictable to estimate performance analytically. We present PartIR, our design for a NN partitioning system. PartIR is focused on an incremental approach to rewriting and is hardware-and-runtime agnostic. We present a simple but powerful API for composing sharding strategies and a simulator to validate them. The process is driven by high-level programmer-issued partitioning tactics, which can be both manual and automatic. Importantly, the tactics are specified separately from the model code, making them easy to change. We evaluate PartIR on several different models to demonstrate its predictability, expressibility, and ability to reach peak performance.. |
| title | PartIR: Composing SPMD Partitioning Strategies for Machine Learning |
| topic | Machine Learning Distributed, Parallel, and Cluster Computing Programming Languages |
| url | https://arxiv.org/abs/2401.11202 |