Saved in:
Bibliographic Details
Main Authors: Andros, R. Jacob, Guhaniyogi, Rajarshi, Francom, Devin, Pasqualini, Donatella
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.18751
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910618994343936
author Andros, R. Jacob
Guhaniyogi, Rajarshi
Francom, Devin
Pasqualini, Donatella
author_facet Andros, R. Jacob
Guhaniyogi, Rajarshi
Francom, Devin
Pasqualini, Donatella
contents In environmental studies, realistic simulations are essential for understanding complex systems. Statistical emulation with Gaussian processes (GPs) in functional data models have become a standard tool for this purpose. Traditional centralized processing of such models requires substantial computational and storage resources, leading to emerging distributed Bayesian learning algorithms that partition data into shards for distributed computations. However, concerns about the sensitivity of distributed inference to shard selection arise. Instead of using data shards, our approach employs multiple random matrices to create random linear projections, or sketches, of the dataset. Posterior inference on functional data models is conducted using random data sketches on various machines in parallel. These individual inferences are combined across machines at a central server. The aggregation of inference across random matrices makes our approach resilient to the selection of data sketches, resulting in robust distributed Bayesian learning. An important advantage is its ability to maintain the privacy of sampling units, as random sketches prevent the recovery of raw data. We highlight the significance of our approach through simulation examples and showcase the performance of our approach as an emulator using surrogates of the Sea, Lake, and Overland Surges from Hurricanes (SLOSH) simulator - an important simulator for government agencies.
format Preprint
id arxiv_https___arxiv_org_abs_2406_18751
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Robust Distributed Learning of Functional Data From Simulators through Data Sketching
Andros, R. Jacob
Guhaniyogi, Rajarshi
Francom, Devin
Pasqualini, Donatella
Applications
In environmental studies, realistic simulations are essential for understanding complex systems. Statistical emulation with Gaussian processes (GPs) in functional data models have become a standard tool for this purpose. Traditional centralized processing of such models requires substantial computational and storage resources, leading to emerging distributed Bayesian learning algorithms that partition data into shards for distributed computations. However, concerns about the sensitivity of distributed inference to shard selection arise. Instead of using data shards, our approach employs multiple random matrices to create random linear projections, or sketches, of the dataset. Posterior inference on functional data models is conducted using random data sketches on various machines in parallel. These individual inferences are combined across machines at a central server. The aggregation of inference across random matrices makes our approach resilient to the selection of data sketches, resulting in robust distributed Bayesian learning. An important advantage is its ability to maintain the privacy of sampling units, as random sketches prevent the recovery of raw data. We highlight the significance of our approach through simulation examples and showcase the performance of our approach as an emulator using surrogates of the Sea, Lake, and Overland Surges from Hurricanes (SLOSH) simulator - an important simulator for government agencies.
title Robust Distributed Learning of Functional Data From Simulators through Data Sketching
topic Applications
url https://arxiv.org/abs/2406.18751