Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Mussmann, Stephen, Raje, Mehul Smriti, Tumkur, Kavya, Messoussi, Oumayma, Hachem, Cyprien, Jacob, Seby
Format:	Preprint
Published:	2026
Subjects:	Data Structures and Algorithms
Online Access:	https://arxiv.org/abs/2601.11765
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911382139568128
author	Mussmann, Stephen Raje, Mehul Smriti Tumkur, Kavya Messoussi, Oumayma Hachem, Cyprien Jacob, Seby
author_facet	Mussmann, Stephen Raje, Mehul Smriti Tumkur, Kavya Messoussi, Oumayma Hachem, Cyprien Jacob, Seby
contents	Semantic embeddings to represent objects such as image, text and audio are widely used in machine learning and have spurred the development of vector similarity search methods for retrieving semantically related objects. In this work, we study the sibling task of estimating a sum over all objects in a set, such as the kernel density estimate (KDE) and the normalizing constant for softmax distributions. While existing solutions provably reduce the sum estimation task to acquiring $\mathcal{O}(\sqrt{n})$ most similar vectors, where $n$ is the number of objects, we introduce a novel algorithm that only requires $\mathcal{O}(\log(n))$ most similar vectors. Our approach randomly assigns objects to levels with exponentially-decaying probabilities and constructs a vector similarity search data structure for each level. With the top-$k$ objects from each level, we propose an unbiased estimate of the sum and prove a high-probability relative error bound. We run experiments on OpenImages and Amazon Reviews with a vector similar search implementation to show that our method can achieve lower error using less computational time than existing reductions. We show results on applications in estimating densities, computing softmax denominators, and counting the number of vectors within a ball.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_11765
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Sum Estimation via Vector Similarity Search Mussmann, Stephen Raje, Mehul Smriti Tumkur, Kavya Messoussi, Oumayma Hachem, Cyprien Jacob, Seby Data Structures and Algorithms Semantic embeddings to represent objects such as image, text and audio are widely used in machine learning and have spurred the development of vector similarity search methods for retrieving semantically related objects. In this work, we study the sibling task of estimating a sum over all objects in a set, such as the kernel density estimate (KDE) and the normalizing constant for softmax distributions. While existing solutions provably reduce the sum estimation task to acquiring $\mathcal{O}(\sqrt{n})$ most similar vectors, where $n$ is the number of objects, we introduce a novel algorithm that only requires $\mathcal{O}(\log(n))$ most similar vectors. Our approach randomly assigns objects to levels with exponentially-decaying probabilities and constructs a vector similarity search data structure for each level. With the top-$k$ objects from each level, we propose an unbiased estimate of the sum and prove a high-probability relative error bound. We run experiments on OpenImages and Amazon Reviews with a vector similar search implementation to show that our method can achieve lower error using less computational time than existing reductions. We show results on applications in estimating densities, computing softmax denominators, and counting the number of vectors within a ball.
title	Sum Estimation via Vector Similarity Search
topic	Data Structures and Algorithms
url	https://arxiv.org/abs/2601.11765

Similar Items