Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sabale, Diandre Miguel, Gatterbauer, Wolfgang, Pandey, Prashant
Format:	Preprint
Published:	2026
Subjects:	Data Structures and Algorithms
Online Access:	https://arxiv.org/abs/2602.13484
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917274531659776
author	Sabale, Diandre Miguel Gatterbauer, Wolfgang Pandey, Prashant
author_facet	Sabale, Diandre Miguel Gatterbauer, Wolfgang Pandey, Prashant
contents	Filters are ubiquitous in computer science, enabling space-efficient approximate membership testing. Since Bloom filters were introduced in 1970, decades of work improved their space efficiency and performance. Recently, three new paradigms have emerged offering orders-of-magnitude improvements in false positive rates (FPRs) by using information beyond the input set: (1) learned filters train a model to distinguish (non)members, (2) stacked filters use negative workload samples to build cascading layers, and (3) adaptive filters update internal representation in response to false positive feedback. Yet each paradigm targets specific use cases, introduces complex configuration tuning, and has been evaluated in isolation. This results in unclear trade-offs and a gap in understanding how these approaches compare and when each is most appropriate. This paper presents the first comprehensive evaluation of learned, stacked, and adaptive filters across real-world datasets and query workloads. Our results reveal critical trade-offs: (1) Learned filters achieve up to 10^2 times lower FPRs but exhibit high variance and lack robustness under skewed or dynamic workloads. Critically, model inference overhead leads to up to 10^4 times slower query latencies than stacked or adaptive filters. (2) Stacked filters reliably achieve up to 10^3 times lower FPRs on skewed workloads but require workload knowledge. (3) Adaptive filters are robust across settings, achieving up to 10^3 times lower FPRs under adversarial queries without workload assumptions. Based on our analysis, learned filters suit stable workloads where input features enable effective model training and space constraints are paramount, stacked filters excel when reliable query distributions are known, and adaptive filters are most generalizable, providing robust theoretically bound guarantees even in dynamic or adversarial environments.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_13484
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	How to Train Your Filter: Should You Learn, Stack or Adapt? Sabale, Diandre Miguel Gatterbauer, Wolfgang Pandey, Prashant Data Structures and Algorithms Filters are ubiquitous in computer science, enabling space-efficient approximate membership testing. Since Bloom filters were introduced in 1970, decades of work improved their space efficiency and performance. Recently, three new paradigms have emerged offering orders-of-magnitude improvements in false positive rates (FPRs) by using information beyond the input set: (1) learned filters train a model to distinguish (non)members, (2) stacked filters use negative workload samples to build cascading layers, and (3) adaptive filters update internal representation in response to false positive feedback. Yet each paradigm targets specific use cases, introduces complex configuration tuning, and has been evaluated in isolation. This results in unclear trade-offs and a gap in understanding how these approaches compare and when each is most appropriate. This paper presents the first comprehensive evaluation of learned, stacked, and adaptive filters across real-world datasets and query workloads. Our results reveal critical trade-offs: (1) Learned filters achieve up to 10^2 times lower FPRs but exhibit high variance and lack robustness under skewed or dynamic workloads. Critically, model inference overhead leads to up to 10^4 times slower query latencies than stacked or adaptive filters. (2) Stacked filters reliably achieve up to 10^3 times lower FPRs on skewed workloads but require workload knowledge. (3) Adaptive filters are robust across settings, achieving up to 10^3 times lower FPRs under adversarial queries without workload assumptions. Based on our analysis, learned filters suit stable workloads where input features enable effective model training and space constraints are paramount, stacked filters excel when reliable query distributions are known, and adaptive filters are most generalizable, providing robust theoretically bound guarantees even in dynamic or adversarial environments.
title	How to Train Your Filter: Should You Learn, Stack or Adapt?
topic	Data Structures and Algorithms
url	https://arxiv.org/abs/2602.13484

Similar Items