MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autore principale:	Crowhurst, Mike
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Methodology Computation
Accesso online:	https://arxiv.org/abs/2605.18691
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866918509816053760
author	Crowhurst, Mike
author_facet	Crowhurst, Mike
contents	The Central Limit Theorem provides a foundation for inferential statistics and hypothesis testing. It describes how standardized statistics behave under repeated sampling from large populations. However, if the size of the sample (n) becomes so large that it approaches the size of the population (N), sampling variability becomes very small, and standard errors and margins of error both approach zero. The purpose of this project was to investigate the behavior of estimators as the sampling fraction (f = n/N) approaches 1, motivated by modern data streams from administrative records, transaction logs, sensor systems, and institutional databases that capture large portions of finite populations. We constructed two finite populations with known parameters and drew repeated samples across a range of sampling fractions. We then examined the resulting randomization distributions of the sample mean to understand how sampling variability collapses. Additional experiments were conducted using various CPU- and GPU-based methods to evaluate the deviation of the sample mean from the defined population mean under different computational conditions. The results confirm that sampling variability diminishes as expected under finite population theory and becomes negligible well before full enumeration is reached. Once sampling variability is minimized, remaining deviations in estimators are primarily related to numerical precision and computational structure rather than random sampling. These findings support a reassessment of inferential assumptions in high-coverage, large-scale data settings.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_18691
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Finite Population Sampling as n to N: Empirical Evidence for the Transition from Inference to Accuracy Crowhurst, Mike Methodology Computation The Central Limit Theorem provides a foundation for inferential statistics and hypothesis testing. It describes how standardized statistics behave under repeated sampling from large populations. However, if the size of the sample (n) becomes so large that it approaches the size of the population (N), sampling variability becomes very small, and standard errors and margins of error both approach zero. The purpose of this project was to investigate the behavior of estimators as the sampling fraction (f = n/N) approaches 1, motivated by modern data streams from administrative records, transaction logs, sensor systems, and institutional databases that capture large portions of finite populations. We constructed two finite populations with known parameters and drew repeated samples across a range of sampling fractions. We then examined the resulting randomization distributions of the sample mean to understand how sampling variability collapses. Additional experiments were conducted using various CPU- and GPU-based methods to evaluate the deviation of the sample mean from the defined population mean under different computational conditions. The results confirm that sampling variability diminishes as expected under finite population theory and becomes negligible well before full enumeration is reached. Once sampling variability is minimized, remaining deviations in estimators are primarily related to numerical precision and computational structure rather than random sampling. These findings support a reassessment of inferential assumptions in high-coverage, large-scale data settings.
title	Finite Population Sampling as n to N: Empirical Evidence for the Transition from Inference to Accuracy
topic	Methodology Computation
url	https://arxiv.org/abs/2605.18691

Documenti analoghi