Saved in:
Bibliographic Details
Main Authors: Łukasiewicz, Aleksander, Tětek, Jakub, Veselý, Pavel
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.01206
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915489596309504
author Łukasiewicz, Aleksander
Tětek, Jakub
Veselý, Pavel
author_facet Łukasiewicz, Aleksander
Tětek, Jakub
Veselý, Pavel
contents Space-efficient streaming estimation of quantiles in massive datasets is a fundamental problem with numerous applications in data monitoring and analysis. While theoretical research led to optimal algorithms, such as the Greenwald-Khanna algorithm or the KLL sketch, practitioners often use other sketches that perform significantly better in practice but lack theoretical guarantees. Most notably, the widely used $t$-digest has unbounded worst-case error. In this paper, we seek to get the best of both worlds. We present a new quantile summary, SplineSketch, for numeric data, offering near-optimal theoretical guarantees, namely uniformly bounded rank error, and outperforming $t$-digest by a factor of 2-20 on a range of synthetic and real-world datasets. To achieve such performance, we develop a novel approach that maintains a dynamic subdivision of the input range into buckets while fitting the input distribution using monotone cubic spline interpolation. The core challenge is implementing this method in a space-efficient manner while ensuring strong worst-case guarantees.
format Preprint
id arxiv_https___arxiv_org_abs_2504_01206
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle SplineSketch: Even More Accurate Quantiles with Error Guarantees
Łukasiewicz, Aleksander
Tětek, Jakub
Veselý, Pavel
Data Structures and Algorithms
Databases
Computation
Space-efficient streaming estimation of quantiles in massive datasets is a fundamental problem with numerous applications in data monitoring and analysis. While theoretical research led to optimal algorithms, such as the Greenwald-Khanna algorithm or the KLL sketch, practitioners often use other sketches that perform significantly better in practice but lack theoretical guarantees. Most notably, the widely used $t$-digest has unbounded worst-case error. In this paper, we seek to get the best of both worlds. We present a new quantile summary, SplineSketch, for numeric data, offering near-optimal theoretical guarantees, namely uniformly bounded rank error, and outperforming $t$-digest by a factor of 2-20 on a range of synthetic and real-world datasets. To achieve such performance, we develop a novel approach that maintains a dynamic subdivision of the input range into buckets while fitting the input distribution using monotone cubic spline interpolation. The core challenge is implementing this method in a space-efficient manner while ensuring strong worst-case guarantees.
title SplineSketch: Even More Accurate Quantiles with Error Guarantees
topic Data Structures and Algorithms
Databases
Computation
url https://arxiv.org/abs/2504.01206