Gespeichert in:
| Hauptverfasser: | , |
|---|---|
| Format: | Preprint |
| Veröffentlicht: |
2026
|
| Schlagworte: | |
| Online-Zugang: | https://arxiv.org/abs/2603.11216 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| _version_ | 1866918383715352576 |
|---|---|
| author | Liu, Kaiwen Zhang, Qin |
| author_facet | Liu, Kaiwen Zhang, Qin |
| contents | We propose a novel framework for statistical estimation on noisy datasets. Within this framework, we focus on the frequency moments ($F_p$) problem and demonstrate that it is possible to approximate $F_p$ of the unknown ground-truth dataset using sublinear space in the data stream model and sublinear communication in the coordinator model, provided that the approximation ratio is parameterized by a data-dependent quantity, which we call the $F_p$-mismatch-ambiguity. We also establish a set of lower bounds, which are tight in terms of the input size. Our results yield several interesting insights: (1) In the data stream model, the $F_p$ problem is inherently more difficult in the noisy setting than in the noiseless one. In particular, while $F_2$ can be approximated in logarithmic space in terms of the input size in the noiseless setting, any algorithm for $F_2$ in the noisy setting requires polynomial space. (2) In the coordinator model, in sharp contrast to the noiseless case, achieving polylogarithmic communication in the input size is generally impossible for $F_p$ under noise. However, when the $F_p$ mismatch ambiguity falls below a certain threshold, it becomes possible to achieve communication that is entirely independent of the input size. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2603_11216 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Frequency Moments in Noisy Streaming and Distributed Data under Mismatch Ambiguity Liu, Kaiwen Zhang, Qin Data Structures and Algorithms Databases We propose a novel framework for statistical estimation on noisy datasets. Within this framework, we focus on the frequency moments ($F_p$) problem and demonstrate that it is possible to approximate $F_p$ of the unknown ground-truth dataset using sublinear space in the data stream model and sublinear communication in the coordinator model, provided that the approximation ratio is parameterized by a data-dependent quantity, which we call the $F_p$-mismatch-ambiguity. We also establish a set of lower bounds, which are tight in terms of the input size. Our results yield several interesting insights: (1) In the data stream model, the $F_p$ problem is inherently more difficult in the noisy setting than in the noiseless one. In particular, while $F_2$ can be approximated in logarithmic space in terms of the input size in the noiseless setting, any algorithm for $F_2$ in the noisy setting requires polynomial space. (2) In the coordinator model, in sharp contrast to the noiseless case, achieving polylogarithmic communication in the input size is generally impossible for $F_p$ under noise. However, when the $F_p$ mismatch ambiguity falls below a certain threshold, it becomes possible to achieve communication that is entirely independent of the input size. |
| title | Frequency Moments in Noisy Streaming and Distributed Data under Mismatch Ambiguity |
| topic | Data Structures and Algorithms Databases |
| url | https://arxiv.org/abs/2603.11216 |