Tallennettuna:
Bibliografiset tiedot
Päätekijät: Kulcsar, Jeremy, Kungurtsev, Vyacheslav, Korpas, Georgios, Giaconi, Giulio, Shoosmith, William
Aineistotyyppi: Preprint
Julkaistu: 2025
Aiheet:
Linkit:https://arxiv.org/abs/2502.07021
Tagit: Lisää tagi
Ei tageja, Lisää ensimmäinen tagi!
_version_ 1866915779375529984
author Kulcsar, Jeremy
Kungurtsev, Vyacheslav
Korpas, Georgios
Giaconi, Giulio
Shoosmith, William
author_facet Kulcsar, Jeremy
Kungurtsev, Vyacheslav
Korpas, Georgios
Giaconi, Giulio
Shoosmith, William
contents We study distributed Sinkhorn iterations for entropy-regularized optimal transport when the Gibbs kernel operator is row-partitioned across c workers and cannot be centralized. We present Federated Sinkhorn, two exact synchronous protocols that exchange only scaling-vector slices: (i) an All-to-All scheme implemented by Allgather, and (ii) a Star (parameter-server) scheme implemented by client to server sends and server to client broadcasts. For both, we derive closed-form per-iteration compute, communication, and memory costs under an alpha-beta latency--bandwidth model, and show that the distributed iterates match centralized Sinkhorn under standard positivity assumptions. Multi-node CPU/GPU experiments validate the model and show that repeated global scaling exchange quickly becomes the dominant bottleneck as c increases. We also report an optional bounded-delay asynchronous schedule and an optional privacy measurement layer for communicated log-scalings.
format Preprint
id arxiv_https___arxiv_org_abs_2502_07021
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Federated Sinkhorn
Kulcsar, Jeremy
Kungurtsev, Vyacheslav
Korpas, Georgios
Giaconi, Giulio
Shoosmith, William
Distributed, Parallel, and Cluster Computing
Machine Learning
We study distributed Sinkhorn iterations for entropy-regularized optimal transport when the Gibbs kernel operator is row-partitioned across c workers and cannot be centralized. We present Federated Sinkhorn, two exact synchronous protocols that exchange only scaling-vector slices: (i) an All-to-All scheme implemented by Allgather, and (ii) a Star (parameter-server) scheme implemented by client to server sends and server to client broadcasts. For both, we derive closed-form per-iteration compute, communication, and memory costs under an alpha-beta latency--bandwidth model, and show that the distributed iterates match centralized Sinkhorn under standard positivity assumptions. Multi-node CPU/GPU experiments validate the model and show that repeated global scaling exchange quickly becomes the dominant bottleneck as c increases. We also report an optional bounded-delay asynchronous schedule and an optional privacy measurement layer for communicated log-scalings.
title Federated Sinkhorn
topic Distributed, Parallel, and Cluster Computing
Machine Learning
url https://arxiv.org/abs/2502.07021