Enregistré dans:
Détails bibliographiques
Auteur principal: Xiao, Han
Format: Preprint
Publié: 2026
Sujets:
Accès en ligne:https://arxiv.org/abs/2602.00079
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866918408851816448
author Xiao, Han
author_facet Xiao, Han
contents We present an $ε$-bounded compression method for unit-norm embeddings that achieves 1.5$\times$ compression, 25% better than the best prior lossless method. The method exploits that spherical coordinates of high-dimensional unit vectors concentrate around $π/2$, causing IEEE 754 exponents to collapse to a single value and high-order mantissa bits to become predictable, enabling entropy coding of both. Reconstruction error is bounded by float32 machine epsilon ($1.19 \times 10^{-7}$), making reconstructed values indistinguishable from originals at float32 precision. Evaluation across 26 configurations spanning text, image, and multi-vector embeddings confirms consistent compression improvement with zero measurable retrieval degradation on BEIR benchmarks.
format Preprint
id arxiv_https___arxiv_org_abs_2602_00079
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Embedding Compression via Spherical Coordinates
Xiao, Han
Machine Learning
Computer Vision and Pattern Recognition
68T50
I.2.7
We present an $ε$-bounded compression method for unit-norm embeddings that achieves 1.5$\times$ compression, 25% better than the best prior lossless method. The method exploits that spherical coordinates of high-dimensional unit vectors concentrate around $π/2$, causing IEEE 754 exponents to collapse to a single value and high-order mantissa bits to become predictable, enabling entropy coding of both. Reconstruction error is bounded by float32 machine epsilon ($1.19 \times 10^{-7}$), making reconstructed values indistinguishable from originals at float32 precision. Evaluation across 26 configurations spanning text, image, and multi-vector embeddings confirms consistent compression improvement with zero measurable retrieval degradation on BEIR benchmarks.
title Embedding Compression via Spherical Coordinates
topic Machine Learning
Computer Vision and Pattern Recognition
68T50
I.2.7
url https://arxiv.org/abs/2602.00079