Tallennettuna:
| Päätekijät: | , , , , |
|---|---|
| Aineistotyyppi: | Preprint |
| Julkaistu: |
2025
|
| Aiheet: | |
| Linkit: | https://arxiv.org/abs/2502.07021 |
| Tagit: |
Lisää tagi
Ei tageja, Lisää ensimmäinen tagi!
|
| _version_ | 1866915779375529984 |
|---|---|
| author | Kulcsar, Jeremy Kungurtsev, Vyacheslav Korpas, Georgios Giaconi, Giulio Shoosmith, William |
| author_facet | Kulcsar, Jeremy Kungurtsev, Vyacheslav Korpas, Georgios Giaconi, Giulio Shoosmith, William |
| contents | We study distributed Sinkhorn iterations for entropy-regularized optimal transport when the Gibbs kernel operator is row-partitioned across c workers and cannot be centralized. We present Federated Sinkhorn, two exact synchronous protocols that exchange only scaling-vector slices: (i) an All-to-All scheme implemented by Allgather, and (ii) a Star (parameter-server) scheme implemented by client to server sends and server to client broadcasts. For both, we derive closed-form per-iteration compute, communication, and memory costs under an alpha-beta latency--bandwidth model, and show that the distributed iterates match centralized Sinkhorn under standard positivity assumptions. Multi-node CPU/GPU experiments validate the model and show that repeated global scaling exchange quickly becomes the dominant bottleneck as c increases. We also report an optional bounded-delay asynchronous schedule and an optional privacy measurement layer for communicated log-scalings. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2502_07021 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Federated Sinkhorn Kulcsar, Jeremy Kungurtsev, Vyacheslav Korpas, Georgios Giaconi, Giulio Shoosmith, William Distributed, Parallel, and Cluster Computing Machine Learning We study distributed Sinkhorn iterations for entropy-regularized optimal transport when the Gibbs kernel operator is row-partitioned across c workers and cannot be centralized. We present Federated Sinkhorn, two exact synchronous protocols that exchange only scaling-vector slices: (i) an All-to-All scheme implemented by Allgather, and (ii) a Star (parameter-server) scheme implemented by client to server sends and server to client broadcasts. For both, we derive closed-form per-iteration compute, communication, and memory costs under an alpha-beta latency--bandwidth model, and show that the distributed iterates match centralized Sinkhorn under standard positivity assumptions. Multi-node CPU/GPU experiments validate the model and show that repeated global scaling exchange quickly becomes the dominant bottleneck as c increases. We also report an optional bounded-delay asynchronous schedule and an optional privacy measurement layer for communicated log-scalings. |
| title | Federated Sinkhorn |
| topic | Distributed, Parallel, and Cluster Computing Machine Learning |
| url | https://arxiv.org/abs/2502.07021 |