Saved in:
Bibliographic Details
Main Author: Morishita, Gota
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.26604
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911632724066304
author Morishita, Gota
author_facet Morishita, Gota
contents Federated learning (FL) trains a shared model from updates contributed by distributed clients, often implicitly assuming that contributing clients are representative of the target population. In practice, this representativeness assumption can fail at two distinct stages, inducing selection bias. First, eligibility rules such as device constraints, software requirements, or user consent determine which clients are ever enrolled and reachable for training, inducing \emph{enrollment bias}. Second, among enrolled clients, user and system factors such as battery state, network status, and local time determine which clients participate in each communication round, inducing \emph{participation bias}. Although existing work has largely addressed round-level participation bias, it has paid far less attention to population-level enrollment bias, which can induce a persistent mismatch between the training objective and the target-population objective. We formalize FL under a two-stage selection model and derive \textsc{FedIPW}, an inverse-probability-weighted aggregation scheme that recovers the target-population mean update under standard ignorability and positivity assumptions. Because client-level covariates are often unavailable for non-enrolled clients, we also introduce a limited-information aggregate-calibration extension that uses known target-population summaries to reweight the enrolled sample, partially correcting enrollment bias. We further provide an algorithm-agnostic optimization analysis under residual weighting error and show that incomplete selection correction can induce a non-vanishing bias floor. Finally, experiments on synthetic federated logistic regression validate the predicted objective mismatch and show that enrollment correction reduces target-population error under two-stage selection.
format Preprint
id arxiv_https___arxiv_org_abs_2604_26604
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Who Trains Matters: Federated Learning under Enrollment and Participation Selection Biases
Morishita, Gota
Machine Learning
Federated learning (FL) trains a shared model from updates contributed by distributed clients, often implicitly assuming that contributing clients are representative of the target population. In practice, this representativeness assumption can fail at two distinct stages, inducing selection bias. First, eligibility rules such as device constraints, software requirements, or user consent determine which clients are ever enrolled and reachable for training, inducing \emph{enrollment bias}. Second, among enrolled clients, user and system factors such as battery state, network status, and local time determine which clients participate in each communication round, inducing \emph{participation bias}. Although existing work has largely addressed round-level participation bias, it has paid far less attention to population-level enrollment bias, which can induce a persistent mismatch between the training objective and the target-population objective. We formalize FL under a two-stage selection model and derive \textsc{FedIPW}, an inverse-probability-weighted aggregation scheme that recovers the target-population mean update under standard ignorability and positivity assumptions. Because client-level covariates are often unavailable for non-enrolled clients, we also introduce a limited-information aggregate-calibration extension that uses known target-population summaries to reweight the enrolled sample, partially correcting enrollment bias. We further provide an algorithm-agnostic optimization analysis under residual weighting error and show that incomplete selection correction can induce a non-vanishing bias floor. Finally, experiments on synthetic federated logistic regression validate the predicted objective mismatch and show that enrollment correction reduces target-population error under two-stage selection.
title Who Trains Matters: Federated Learning under Enrollment and Participation Selection Biases
topic Machine Learning
url https://arxiv.org/abs/2604.26604