Saved in:
Bibliographic Details
Main Authors: Shelton, Jacquelyn, Polewski, Przemyslaw, Robel, Alexander, Hoffman, Matthew, Price, Stephen
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.00915
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915704787173376
author Shelton, Jacquelyn
Polewski, Przemyslaw
Robel, Alexander
Hoffman, Matthew
Price, Stephen
author_facet Shelton, Jacquelyn
Polewski, Przemyslaw
Robel, Alexander
Hoffman, Matthew
Price, Stephen
contents Large climate-model ensembles are computationally expensive; yet many downstream analyses would benefit from additional, statistically consistent realizations of spatiotemporal climate variables. We study a generative modeling approach for producing new realizations from a limited set of available runs by transferring structure learned across an ensemble. Using monthly near-surface temperature time series from ten independent reanalysis realizations (ERA5), we find that a vanilla conditional variational autoencoder (CVAE) trained jointly across realizations yields a fragmented latent space that fails to generalize to unseen ensemble members. To address this, we introduce a latent-constrained CVAE (LC-CVAE) that enforces cross-realization homogeneity of latent embeddings at a small set of shared geographic 'anchor' locations. We then use multi-output Gaussian process regression in the latent space to predict latent coordinates at unsampled locations in a new realization, followed by decoding to generate full time series fields. Experiments and ablations demonstrate (i) instability when training on a single realization, (ii) diminishing returns after incorporating roughly five realizations, and (iii) a trade-off between spatial coverage and reconstruction quality that is closely linked to the average neighbor distance in latent space.
format Preprint
id arxiv_https___arxiv_org_abs_2601_00915
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Latent-Constrained Conditional VAEs for Augmenting Large-Scale Climate Ensembles
Shelton, Jacquelyn
Polewski, Przemyslaw
Robel, Alexander
Hoffman, Matthew
Price, Stephen
Machine Learning
Large climate-model ensembles are computationally expensive; yet many downstream analyses would benefit from additional, statistically consistent realizations of spatiotemporal climate variables. We study a generative modeling approach for producing new realizations from a limited set of available runs by transferring structure learned across an ensemble. Using monthly near-surface temperature time series from ten independent reanalysis realizations (ERA5), we find that a vanilla conditional variational autoencoder (CVAE) trained jointly across realizations yields a fragmented latent space that fails to generalize to unseen ensemble members. To address this, we introduce a latent-constrained CVAE (LC-CVAE) that enforces cross-realization homogeneity of latent embeddings at a small set of shared geographic 'anchor' locations. We then use multi-output Gaussian process regression in the latent space to predict latent coordinates at unsampled locations in a new realization, followed by decoding to generate full time series fields. Experiments and ablations demonstrate (i) instability when training on a single realization, (ii) diminishing returns after incorporating roughly five realizations, and (iii) a trade-off between spatial coverage and reconstruction quality that is closely linked to the average neighbor distance in latent space.
title Latent-Constrained Conditional VAEs for Augmenting Large-Scale Climate Ensembles
topic Machine Learning
url https://arxiv.org/abs/2601.00915