Enregistré dans:
| Auteur principal: | |
|---|---|
| Format: | Preprint |
| Publié: |
2024
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2410.21922 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866910164146192384 |
|---|---|
| author | Li, Jiawen |
| author_facet | Li, Jiawen |
| contents | We introduce Prior Knowledge Acceleration (PKA), a batch-update method for variance that reuses previously computed sufficient statistics to avoid full recomputation. The update identity is algebraically equivalent to the pairwise formula of Chan, Golub, and LeVeque (1983); our contribution is a runtime-cost analysis that derives an explicit acceleration factor $τ_a$ and identifies the data-size regime where batch updating outperforms both naïve recomputation and Ross's single-sample method. We prove that Ross's approach is preferable only when the new batch contains a single sample ($N_2 = 1$). We further generalise the framework to covariance and other decomposable statistics. Benchmarks against Welford, Chan pairwise, and naïve two-pass baselines on synthetic and real-world streaming data confirm the theoretical predictions, with speedups of up to $454\times$ when the prior dataset is large relative to the new batch. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2410_21922 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Extending Sheldon M. Ross's Method for Efficient Large-Scale Variance Computation Li, Jiawen Computation Statistics Theory We introduce Prior Knowledge Acceleration (PKA), a batch-update method for variance that reuses previously computed sufficient statistics to avoid full recomputation. The update identity is algebraically equivalent to the pairwise formula of Chan, Golub, and LeVeque (1983); our contribution is a runtime-cost analysis that derives an explicit acceleration factor $τ_a$ and identifies the data-size regime where batch updating outperforms both naïve recomputation and Ross's single-sample method. We prove that Ross's approach is preferable only when the new batch contains a single sample ($N_2 = 1$). We further generalise the framework to covariance and other decomposable statistics. Benchmarks against Welford, Chan pairwise, and naïve two-pass baselines on synthetic and real-world streaming data confirm the theoretical predictions, with speedups of up to $454\times$ when the prior dataset is large relative to the new batch. |
| title | Extending Sheldon M. Ross's Method for Efficient Large-Scale Variance Computation |
| topic | Computation Statistics Theory |
| url | https://arxiv.org/abs/2410.21922 |