Saved in:
Bibliographic Details
Main Author: Zheng, Sijie
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.12321
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912831626018816
author Zheng, Sijie
author_facet Zheng, Sijie
contents Surface ozone pollution remains a persistent challenge in many metropolitan regions worldwide, as the nonlinear dependence of ozone formation on nitrogen oxides and volatile organic compounds (VOCs) complicates the design of effective emission control strategies. While chemical transport models provide mechanistic insights, they rely on detailed emission inventories and are computationally expensive. This study develops a machine learning--based surrogate framework inspired by the Empirical Kinetic Modeling Approach (EKMA). Using hourly air quality observations from Los Angeles during 2024--2025, a random forest model is trained to predict surface ozone concentrations based on precursor measurements and spatiotemporal features, including site location and cyclic time encodings. The model achieves strong predictive performance, with permutation importance highlighting the dominant roles of diurnal temporal features and nitrogen dioxide, along with additional contributions from carbon monoxide. Building on the trained surrogate, EKMA-style sensitivity experiments are conducted by perturbing precursor concentrations while holding other covariates fixed. The results indicate that ozone formation in Los Angeles during the study period is predominantly VOC-limited. Overall, the proposed framework offers an efficient and interpretable approach for ozone regime diagnosis in data-rich urban environments.
format Preprint
id arxiv_https___arxiv_org_abs_2601_12321
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle A Machine Learning--Based Surrogate EKMA Framework for Diagnosing Urban Ozone Formation Regimes: Evidence from Los Angeles
Zheng, Sijie
Applications
Surface ozone pollution remains a persistent challenge in many metropolitan regions worldwide, as the nonlinear dependence of ozone formation on nitrogen oxides and volatile organic compounds (VOCs) complicates the design of effective emission control strategies. While chemical transport models provide mechanistic insights, they rely on detailed emission inventories and are computationally expensive. This study develops a machine learning--based surrogate framework inspired by the Empirical Kinetic Modeling Approach (EKMA). Using hourly air quality observations from Los Angeles during 2024--2025, a random forest model is trained to predict surface ozone concentrations based on precursor measurements and spatiotemporal features, including site location and cyclic time encodings. The model achieves strong predictive performance, with permutation importance highlighting the dominant roles of diurnal temporal features and nitrogen dioxide, along with additional contributions from carbon monoxide. Building on the trained surrogate, EKMA-style sensitivity experiments are conducted by perturbing precursor concentrations while holding other covariates fixed. The results indicate that ozone formation in Los Angeles during the study period is predominantly VOC-limited. Overall, the proposed framework offers an efficient and interpretable approach for ozone regime diagnosis in data-rich urban environments.
title A Machine Learning--Based Surrogate EKMA Framework for Diagnosing Urban Ozone Formation Regimes: Evidence from Los Angeles
topic Applications
url https://arxiv.org/abs/2601.12321