Saved in:
Bibliographic Details
Main Authors: Lee, Duncan, Davies, Vinny
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.02277
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911416517132288
author Lee, Duncan
Davies, Vinny
author_facet Lee, Duncan
Davies, Vinny
contents Spatial epidemiology identifies the drivers of elevated population-level disease risks, using disease counts, exposures and known confounders at the areal unit level. Poisson regression models are typically used for inference, which incorporate a linear/additive regression component and allow for unmeasured confounding via a set of spatially autocorrelated random effects. This approach requires the confounder interactions and their functional relationships with disease risk to be specified in advance, rather than being learned from the data. Therefore, this paper proposes the SPAR-Forest-ERF algorithm, which is the first fusion of random forests for capturing non-linear and interacting confounder-response effects with Bayesian spatial autocorrelation models that can estimate interpretable exposure response functions (ERF) with full uncertainty quantification. Methodologically, we extend existing methods set in a prediction context by propagating uncertainty between both the ML and statistical models, developing a new stopping criteria designed to ensure the stability of the primary inferential target, and incorporating a range of different ERFs for maximum model flexibility. This methodology is motivated by a new study quantifying the impact of air pollution concentrations on self-rated health in Scotland, using data from the recently released 2022 national census.
format Preprint
id arxiv_https___arxiv_org_abs_2602_02277
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle A spatial random forest algorithm for population-level epidemiological risk assessment
Lee, Duncan
Davies, Vinny
Methodology
Spatial epidemiology identifies the drivers of elevated population-level disease risks, using disease counts, exposures and known confounders at the areal unit level. Poisson regression models are typically used for inference, which incorporate a linear/additive regression component and allow for unmeasured confounding via a set of spatially autocorrelated random effects. This approach requires the confounder interactions and their functional relationships with disease risk to be specified in advance, rather than being learned from the data. Therefore, this paper proposes the SPAR-Forest-ERF algorithm, which is the first fusion of random forests for capturing non-linear and interacting confounder-response effects with Bayesian spatial autocorrelation models that can estimate interpretable exposure response functions (ERF) with full uncertainty quantification. Methodologically, we extend existing methods set in a prediction context by propagating uncertainty between both the ML and statistical models, developing a new stopping criteria designed to ensure the stability of the primary inferential target, and incorporating a range of different ERFs for maximum model flexibility. This methodology is motivated by a new study quantifying the impact of air pollution concentrations on self-rated health in Scotland, using data from the recently released 2022 national census.
title A spatial random forest algorithm for population-level epidemiological risk assessment
topic Methodology
url https://arxiv.org/abs/2602.02277