Saved in:
Bibliographic Details
Main Authors: Liyanaarachchi, Sahan, Thilakarathna, Kanchana, Ulukus, Sennur
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2405.15744
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910458793951232
author Liyanaarachchi, Sahan
Thilakarathna, Kanchana
Ulukus, Sennur
author_facet Liyanaarachchi, Sahan
Thilakarathna, Kanchana
Ulukus, Sennur
contents In many federated learning (FL) models, a common strategy employed to ensure the progress in the training process, is to wait for at least $M$ clients out of the total $N$ clients to send back their local gradients based on a reporting deadline $T$, once the parameter server (PS) has broadcasted the global model. If enough clients do not report back within the deadline, the particular round is considered to be a failed round and the training round is restarted from scratch. If enough clients have responded back, the round is deemed successful and the local gradients of all the clients that responded back are used to update the global model. In either case, the clients that failed to report back an update within the deadline would have wasted their computational resources. Having a tighter deadline (small $T$) and waiting for a larger number of participating clients (large $M$) leads to a large number of failed rounds and therefore greater communication cost and computation resource wastage. However, having a larger $T$ leads to longer round durations whereas smaller $M$ may lead to noisy gradients. Therefore, there is a need to optimize the parameters $M$ and $T$ such that communication cost and the resource wastage is minimized while having an acceptable convergence rate. In this regard, we show that the average age of a client at the PS appears explicitly in the theoretical convergence bound, and therefore, can be used as a metric to quantify the convergence of the global model. We provide an analytical scheme to select the parameters $M$ and $T$ in this setting.
format Preprint
id arxiv_https___arxiv_org_abs_2405_15744
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle CAFe: Cost and Age aware Federated Learning
Liyanaarachchi, Sahan
Thilakarathna, Kanchana
Ulukus, Sennur
Machine Learning
Distributed, Parallel, and Cluster Computing
Information Theory
In many federated learning (FL) models, a common strategy employed to ensure the progress in the training process, is to wait for at least $M$ clients out of the total $N$ clients to send back their local gradients based on a reporting deadline $T$, once the parameter server (PS) has broadcasted the global model. If enough clients do not report back within the deadline, the particular round is considered to be a failed round and the training round is restarted from scratch. If enough clients have responded back, the round is deemed successful and the local gradients of all the clients that responded back are used to update the global model. In either case, the clients that failed to report back an update within the deadline would have wasted their computational resources. Having a tighter deadline (small $T$) and waiting for a larger number of participating clients (large $M$) leads to a large number of failed rounds and therefore greater communication cost and computation resource wastage. However, having a larger $T$ leads to longer round durations whereas smaller $M$ may lead to noisy gradients. Therefore, there is a need to optimize the parameters $M$ and $T$ such that communication cost and the resource wastage is minimized while having an acceptable convergence rate. In this regard, we show that the average age of a client at the PS appears explicitly in the theoretical convergence bound, and therefore, can be used as a metric to quantify the convergence of the global model. We provide an analytical scheme to select the parameters $M$ and $T$ in this setting.
title CAFe: Cost and Age aware Federated Learning
topic Machine Learning
Distributed, Parallel, and Cluster Computing
Information Theory
url https://arxiv.org/abs/2405.15744