Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Qu, Yuanke, Xu, Xiaoya, Zhang, Hengtao
Format:	Preprint
Published:	2026
Subjects:	Methodology
Online Access:	https://arxiv.org/abs/2605.05772
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915987308150784
author	Qu, Yuanke Xu, Xiaoya Zhang, Hengtao
author_facet	Qu, Yuanke Xu, Xiaoya Zhang, Hengtao
contents	Double machine learning (DML) delivers valid inference on low-dimensional causal parameters while permitting flexible nuisance estimation, but its computational cost becomes prohibitive once cross-fitted learners must be trained on massive observational data. Applying DML to a uniformly drawn subsample alleviates this burden, yet such a reduction disregards the geometry of the covariate space and can exacerbate treated-control imbalance as well as overlap deficiency. We propose Uniform Design Double Machine Learning (UD-DML), a design-based subsampling strategy for average treatment effect (ATE) estimation. UD-DML first constructs a low-discrepancy skeleton in a PCA-rotated covariate space under the mixture-discrepancy criterion, and then assigns, to each skeleton point, the nearest treated and control units via KD-tree search. The resulting matched subsample is, by construction, both representative of the full covariate distribution and balanced across treatment arms; cross-fitted DML is subsequently applied to it. We establish discrepancy-based guarantees for representativeness and balance, and prove that the UD-DML estimator is $\sqrt{r}$-asymptotically normal under mild conditions, where the selected subsample size $r \ll n$. The dominant nuisance-fitting cost is thereby reduced from the $n$-scale to the $r$-scale. Monte Carlo experiments show that UD-DML attains lower RMSE, narrower confidence intervals and more reliable coverage than uniform subsampling, with the largest gains in low-overlap and misspecified regimes. An application to a large observational dataset further demonstrates its practical feasibility.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_05772
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	UD-DML: Uniform Design Subsampling for Double Machine Learning over Massive Data Qu, Yuanke Xu, Xiaoya Zhang, Hengtao Methodology Double machine learning (DML) delivers valid inference on low-dimensional causal parameters while permitting flexible nuisance estimation, but its computational cost becomes prohibitive once cross-fitted learners must be trained on massive observational data. Applying DML to a uniformly drawn subsample alleviates this burden, yet such a reduction disregards the geometry of the covariate space and can exacerbate treated-control imbalance as well as overlap deficiency. We propose Uniform Design Double Machine Learning (UD-DML), a design-based subsampling strategy for average treatment effect (ATE) estimation. UD-DML first constructs a low-discrepancy skeleton in a PCA-rotated covariate space under the mixture-discrepancy criterion, and then assigns, to each skeleton point, the nearest treated and control units via KD-tree search. The resulting matched subsample is, by construction, both representative of the full covariate distribution and balanced across treatment arms; cross-fitted DML is subsequently applied to it. We establish discrepancy-based guarantees for representativeness and balance, and prove that the UD-DML estimator is $\sqrt{r}$-asymptotically normal under mild conditions, where the selected subsample size $r \ll n$. The dominant nuisance-fitting cost is thereby reduced from the $n$-scale to the $r$-scale. Monte Carlo experiments show that UD-DML attains lower RMSE, narrower confidence intervals and more reliable coverage than uniform subsampling, with the largest gains in low-overlap and misspecified regimes. An application to a large observational dataset further demonstrates its practical feasibility.
title	UD-DML: Uniform Design Subsampling for Double Machine Learning over Massive Data
topic	Methodology
url	https://arxiv.org/abs/2605.05772

Similar Items