Saved in:
Bibliographic Details
Main Authors: Diakonikolas, Ilias, Kane, Daniel M., Lee, Jasper C. H., Pittas, Thanasis, Woodruff, David P., Zhou, Samson
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2506.22608
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911027246923776
author Diakonikolas, Ilias
Kane, Daniel M.
Lee, Jasper C. H.
Pittas, Thanasis
Woodruff, David P.
Zhou, Samson
author_facet Diakonikolas, Ilias
Kane, Daniel M.
Lee, Jasper C. H.
Pittas, Thanasis
Woodruff, David P.
Zhou, Samson
contents We study the problem of distributed distinct element estimation, where $α$ servers each receive a subset of a universe $[n]$ and aim to compute a $(1+\varepsilon)$-approximation to the number of distinct elements using minimal communication. While prior work establishes a worst-case bound of $Θ\left(α\log n+\fracα{\varepsilon^2}\right)$ bits, these results rely on assumptions that may not hold in practice. We introduce a new parameterization based on the number $C = \fracβ{\varepsilon^2}$ of pairwise collisions, i.e., instances where the same element appears on multiple servers, and design a protocol that uses only $\mathcal{O}\left(α\log n+\frac{\sqrtβ}{\varepsilon^2} \log n\right)$ bits, breaking previous lower bounds when $C$ is small. We further improve our algorithm under assumptions on the number of distinct elements or collisions and provide matching lower bounds in all regimes, establishing $C$ as a tight complexity measure for the problem. Finally, we consider streaming algorithms for distinct element estimation parameterized by the number of items with frequency larger than $1$. Overall, our results offer insight into why statistical problems with known hardness results can be efficiently solved in practice.
format Preprint
id arxiv_https___arxiv_org_abs_2506_22608
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle On Fine-Grained Distinct Element Estimation
Diakonikolas, Ilias
Kane, Daniel M.
Lee, Jasper C. H.
Pittas, Thanasis
Woodruff, David P.
Zhou, Samson
Data Structures and Algorithms
We study the problem of distributed distinct element estimation, where $α$ servers each receive a subset of a universe $[n]$ and aim to compute a $(1+\varepsilon)$-approximation to the number of distinct elements using minimal communication. While prior work establishes a worst-case bound of $Θ\left(α\log n+\fracα{\varepsilon^2}\right)$ bits, these results rely on assumptions that may not hold in practice. We introduce a new parameterization based on the number $C = \fracβ{\varepsilon^2}$ of pairwise collisions, i.e., instances where the same element appears on multiple servers, and design a protocol that uses only $\mathcal{O}\left(α\log n+\frac{\sqrtβ}{\varepsilon^2} \log n\right)$ bits, breaking previous lower bounds when $C$ is small. We further improve our algorithm under assumptions on the number of distinct elements or collisions and provide matching lower bounds in all regimes, establishing $C$ as a tight complexity measure for the problem. Finally, we consider streaming algorithms for distinct element estimation parameterized by the number of items with frequency larger than $1$. Overall, our results offer insight into why statistical problems with known hardness results can be efficiently solved in practice.
title On Fine-Grained Distinct Element Estimation
topic Data Structures and Algorithms
url https://arxiv.org/abs/2506.22608