Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteur principal:	Li, Jiawen
Format:	Preprint
Publié:	2024
Sujets:	Computation Statistics Theory
Accès en ligne:	https://arxiv.org/abs/2410.21922
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866910164146192384
author	Li, Jiawen
author_facet	Li, Jiawen
contents	We introduce Prior Knowledge Acceleration (PKA), a batch-update method for variance that reuses previously computed sufficient statistics to avoid full recomputation. The update identity is algebraically equivalent to the pairwise formula of Chan, Golub, and LeVeque (1983); our contribution is a runtime-cost analysis that derives an explicit acceleration factor $τ_a$ and identifies the data-size regime where batch updating outperforms both naïve recomputation and Ross's single-sample method. We prove that Ross's approach is preferable only when the new batch contains a single sample ($N_2 = 1$). We further generalise the framework to covariance and other decomposable statistics. Benchmarks against Welford, Chan pairwise, and naïve two-pass baselines on synthetic and real-world streaming data confirm the theoretical predictions, with speedups of up to $454\times$ when the prior dataset is large relative to the new batch.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_21922
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Extending Sheldon M. Ross's Method for Efficient Large-Scale Variance Computation Li, Jiawen Computation Statistics Theory We introduce Prior Knowledge Acceleration (PKA), a batch-update method for variance that reuses previously computed sufficient statistics to avoid full recomputation. The update identity is algebraically equivalent to the pairwise formula of Chan, Golub, and LeVeque (1983); our contribution is a runtime-cost analysis that derives an explicit acceleration factor $τ_a$ and identifies the data-size regime where batch updating outperforms both naïve recomputation and Ross's single-sample method. We prove that Ross's approach is preferable only when the new batch contains a single sample ($N_2 = 1$). We further generalise the framework to covariance and other decomposable statistics. Benchmarks against Welford, Chan pairwise, and naïve two-pass baselines on synthetic and real-world streaming data confirm the theoretical predictions, with speedups of up to $454\times$ when the prior dataset is large relative to the new batch.
title	Extending Sheldon M. Ross's Method for Efficient Large-Scale Variance Computation
topic	Computation Statistics Theory
url	https://arxiv.org/abs/2410.21922

Documents similaires