Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Yuanli, Huang, Lei
Format:	Preprint
Published:	2024
Subjects:	Distributed, Parallel, and Cluster Computing Machine Learning
Online Access:	https://arxiv.org/abs/2406.01774
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929371809316864
author	Wang, Yuanli Huang, Lei
author_facet	Wang, Yuanli Huang, Lei
contents	Federated Learning(FL) is a privacy-preserving machine learning paradigm where a global model is trained in-situ across a large number of distributed edge devices. These systems are often comprised of millions of user devices and only a subset of available devices can be used for training in each epoch. Designing a device selection strategy is challenging, given that devices are highly heterogeneous in both their system resources and training data. This heterogeneity makes device selection very crucial for timely model convergence and sufficient model accuracy. To tackle the FL client heterogeneity problem, various client selection algorithms have been developed, showing promising performance improvement in terms of model coverage and accuracy. In this work, we study the overhead of client selection algorithms in a large scale FL environment. Then we propose an efficient data distribution summary calculation algorithm to reduce the overhead in a real-world large scale FL environment. The evaluation shows that our proposed solution could achieve up to 30x reduction in data summary time, and up to 360x reduction in clustering time.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_01774
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Efficient Data Distribution Estimation for Accelerated Federated Learning Wang, Yuanli Huang, Lei Distributed, Parallel, and Cluster Computing Machine Learning Federated Learning(FL) is a privacy-preserving machine learning paradigm where a global model is trained in-situ across a large number of distributed edge devices. These systems are often comprised of millions of user devices and only a subset of available devices can be used for training in each epoch. Designing a device selection strategy is challenging, given that devices are highly heterogeneous in both their system resources and training data. This heterogeneity makes device selection very crucial for timely model convergence and sufficient model accuracy. To tackle the FL client heterogeneity problem, various client selection algorithms have been developed, showing promising performance improvement in terms of model coverage and accuracy. In this work, we study the overhead of client selection algorithms in a large scale FL environment. Then we propose an efficient data distribution summary calculation algorithm to reduce the overhead in a real-world large scale FL environment. The evaluation shows that our proposed solution could achieve up to 30x reduction in data summary time, and up to 360x reduction in clustering time.
title	Efficient Data Distribution Estimation for Accelerated Federated Learning
topic	Distributed, Parallel, and Cluster Computing Machine Learning
url	https://arxiv.org/abs/2406.01774

Similar Items