Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Shelton, Jacquelyn, Polewski, Przemyslaw, Robel, Alexander, Hoffman, Matthew, Price, Stephen
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2601.00915
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915704787173376
author	Shelton, Jacquelyn Polewski, Przemyslaw Robel, Alexander Hoffman, Matthew Price, Stephen
author_facet	Shelton, Jacquelyn Polewski, Przemyslaw Robel, Alexander Hoffman, Matthew Price, Stephen
contents	Large climate-model ensembles are computationally expensive; yet many downstream analyses would benefit from additional, statistically consistent realizations of spatiotemporal climate variables. We study a generative modeling approach for producing new realizations from a limited set of available runs by transferring structure learned across an ensemble. Using monthly near-surface temperature time series from ten independent reanalysis realizations (ERA5), we find that a vanilla conditional variational autoencoder (CVAE) trained jointly across realizations yields a fragmented latent space that fails to generalize to unseen ensemble members. To address this, we introduce a latent-constrained CVAE (LC-CVAE) that enforces cross-realization homogeneity of latent embeddings at a small set of shared geographic 'anchor' locations. We then use multi-output Gaussian process regression in the latent space to predict latent coordinates at unsampled locations in a new realization, followed by decoding to generate full time series fields. Experiments and ablations demonstrate (i) instability when training on a single realization, (ii) diminishing returns after incorporating roughly five realizations, and (iii) a trade-off between spatial coverage and reconstruction quality that is closely linked to the average neighbor distance in latent space.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_00915
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Latent-Constrained Conditional VAEs for Augmenting Large-Scale Climate Ensembles Shelton, Jacquelyn Polewski, Przemyslaw Robel, Alexander Hoffman, Matthew Price, Stephen Machine Learning Large climate-model ensembles are computationally expensive; yet many downstream analyses would benefit from additional, statistically consistent realizations of spatiotemporal climate variables. We study a generative modeling approach for producing new realizations from a limited set of available runs by transferring structure learned across an ensemble. Using monthly near-surface temperature time series from ten independent reanalysis realizations (ERA5), we find that a vanilla conditional variational autoencoder (CVAE) trained jointly across realizations yields a fragmented latent space that fails to generalize to unseen ensemble members. To address this, we introduce a latent-constrained CVAE (LC-CVAE) that enforces cross-realization homogeneity of latent embeddings at a small set of shared geographic 'anchor' locations. We then use multi-output Gaussian process regression in the latent space to predict latent coordinates at unsampled locations in a new realization, followed by decoding to generate full time series fields. Experiments and ablations demonstrate (i) instability when training on a single realization, (ii) diminishing returns after incorporating roughly five realizations, and (iii) a trade-off between spatial coverage and reconstruction quality that is closely linked to the average neighbor distance in latent space.
title	Latent-Constrained Conditional VAEs for Augmenting Large-Scale Climate Ensembles
topic	Machine Learning
url	https://arxiv.org/abs/2601.00915

Similar Items