Saved in:
Bibliographic Details
Main Authors: Huang, Kaili, Venkatesh, Thejas, Dingankar, Uma, Mallia, Antonio, Campos, Daniel, Jiao, Jian, Potts, Christopher, Zaharia, Matei, Boahen, Kwabena, Khattab, Omar, Sarup, Saarthak, Santhanam, Keshav
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.14903
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909586241355776
author Huang, Kaili
Venkatesh, Thejas
Dingankar, Uma
Mallia, Antonio
Campos, Daniel
Jiao, Jian
Potts, Christopher
Zaharia, Matei
Boahen, Kwabena
Khattab, Omar
Sarup, Saarthak
Santhanam, Keshav
author_facet Huang, Kaili
Venkatesh, Thejas
Dingankar, Uma
Mallia, Antonio
Campos, Daniel
Jiao, Jian
Potts, Christopher
Zaharia, Matei
Boahen, Kwabena
Khattab, Omar
Sarup, Saarthak
Santhanam, Keshav
contents We study serving retrieval models, specifically late interaction models like ColBERT, to many concurrent users at once and under a small budget, in which the index may not fit in memory. We present ColBERT-serve, a novel serving system that applies a memory-mapping strategy to the ColBERT index, reducing RAM usage by 90% and permitting its deployment on cheap servers, and incorporates a multi-stage architecture with hybrid scoring, reducing ColBERT's query latency and supporting many concurrent queries in parallel.
format Preprint
id arxiv_https___arxiv_org_abs_2504_14903
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle ColBERT-serve: Efficient Multi-Stage Memory-Mapped Scoring
Huang, Kaili
Venkatesh, Thejas
Dingankar, Uma
Mallia, Antonio
Campos, Daniel
Jiao, Jian
Potts, Christopher
Zaharia, Matei
Boahen, Kwabena
Khattab, Omar
Sarup, Saarthak
Santhanam, Keshav
Information Retrieval
We study serving retrieval models, specifically late interaction models like ColBERT, to many concurrent users at once and under a small budget, in which the index may not fit in memory. We present ColBERT-serve, a novel serving system that applies a memory-mapping strategy to the ColBERT index, reducing RAM usage by 90% and permitting its deployment on cheap servers, and incorporates a multi-stage architecture with hybrid scoring, reducing ColBERT's query latency and supporting many concurrent queries in parallel.
title ColBERT-serve: Efficient Multi-Stage Memory-Mapped Scoring
topic Information Retrieval
url https://arxiv.org/abs/2504.14903