Saved in:
Bibliographic Details
Main Authors: Vazquez, Jesus E., Shen, Yicheng, Akulian, Jason, Hochberg, Chad, Iwashyna, Theodore J., Stuart, Elizabeth A., Tong, Jiayi
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.20125
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916028714319872
author Vazquez, Jesus E.
Shen, Yicheng
Akulian, Jason
Hochberg, Chad
Iwashyna, Theodore J.
Stuart, Elizabeth A.
Tong, Jiayi
author_facet Vazquez, Jesus E.
Shen, Yicheng
Akulian, Jason
Hochberg, Chad
Iwashyna, Theodore J.
Stuart, Elizabeth A.
Tong, Jiayi
contents Privacy constraints have driven the rise of federated learning (FL), which enables multi-site analyses without sharing individual participant data. We develop a framework for FL with missing data, identifying conditions under which the complete case (CC) estimator is preferred over the inverse probability weighting (IPW) estimator. For settings where the CC estimator fails, we introduce a calibrated weight estimation approach that combines candidate weighting models across sites and remains consistent if at least one is correctly specified. Consistency conditions are stated at the site level, ensuring that the federated estimator inherits validity from local properties. We derive a sandwich variance estimator that accounts for uncertainty in weight estimation, and illustrate the framework by evaluating risk factors for 90-day mortality among patients with pleural infections treated with intrapleural enzyme therapy.
format Preprint
id arxiv_https___arxiv_org_abs_2605_20125
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Federated Learning with Incomplete Data: When to Use Complete Cases and When to Weight
Vazquez, Jesus E.
Shen, Yicheng
Akulian, Jason
Hochberg, Chad
Iwashyna, Theodore J.
Stuart, Elizabeth A.
Tong, Jiayi
Methodology
Statistics Theory
Privacy constraints have driven the rise of federated learning (FL), which enables multi-site analyses without sharing individual participant data. We develop a framework for FL with missing data, identifying conditions under which the complete case (CC) estimator is preferred over the inverse probability weighting (IPW) estimator. For settings where the CC estimator fails, we introduce a calibrated weight estimation approach that combines candidate weighting models across sites and remains consistent if at least one is correctly specified. Consistency conditions are stated at the site level, ensuring that the federated estimator inherits validity from local properties. We derive a sandwich variance estimator that accounts for uncertainty in weight estimation, and illustrate the framework by evaluating risk factors for 90-day mortality among patients with pleural infections treated with intrapleural enzyme therapy.
title Federated Learning with Incomplete Data: When to Use Complete Cases and When to Weight
topic Methodology
Statistics Theory
url https://arxiv.org/abs/2605.20125