Enregistré dans:
Détails bibliographiques
Auteur principal: Li, Jiawen
Format: Preprint
Publié: 2024
Sujets:
Accès en ligne:https://arxiv.org/abs/2410.21922
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866910164146192384
author Li, Jiawen
author_facet Li, Jiawen
contents We introduce Prior Knowledge Acceleration (PKA), a batch-update method for variance that reuses previously computed sufficient statistics to avoid full recomputation. The update identity is algebraically equivalent to the pairwise formula of Chan, Golub, and LeVeque (1983); our contribution is a runtime-cost analysis that derives an explicit acceleration factor $τ_a$ and identifies the data-size regime where batch updating outperforms both naïve recomputation and Ross's single-sample method. We prove that Ross's approach is preferable only when the new batch contains a single sample ($N_2 = 1$). We further generalise the framework to covariance and other decomposable statistics. Benchmarks against Welford, Chan pairwise, and naïve two-pass baselines on synthetic and real-world streaming data confirm the theoretical predictions, with speedups of up to $454\times$ when the prior dataset is large relative to the new batch.
format Preprint
id arxiv_https___arxiv_org_abs_2410_21922
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Extending Sheldon M. Ross's Method for Efficient Large-Scale Variance Computation
Li, Jiawen
Computation
Statistics Theory
We introduce Prior Knowledge Acceleration (PKA), a batch-update method for variance that reuses previously computed sufficient statistics to avoid full recomputation. The update identity is algebraically equivalent to the pairwise formula of Chan, Golub, and LeVeque (1983); our contribution is a runtime-cost analysis that derives an explicit acceleration factor $τ_a$ and identifies the data-size regime where batch updating outperforms both naïve recomputation and Ross's single-sample method. We prove that Ross's approach is preferable only when the new batch contains a single sample ($N_2 = 1$). We further generalise the framework to covariance and other decomposable statistics. Benchmarks against Welford, Chan pairwise, and naïve two-pass baselines on synthetic and real-world streaming data confirm the theoretical predictions, with speedups of up to $454\times$ when the prior dataset is large relative to the new batch.
title Extending Sheldon M. Ross's Method for Efficient Large-Scale Variance Computation
topic Computation
Statistics Theory
url https://arxiv.org/abs/2410.21922