Saved in:
| Main Authors: | , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.14903 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866909586241355776 |
|---|---|
| author | Huang, Kaili Venkatesh, Thejas Dingankar, Uma Mallia, Antonio Campos, Daniel Jiao, Jian Potts, Christopher Zaharia, Matei Boahen, Kwabena Khattab, Omar Sarup, Saarthak Santhanam, Keshav |
| author_facet | Huang, Kaili Venkatesh, Thejas Dingankar, Uma Mallia, Antonio Campos, Daniel Jiao, Jian Potts, Christopher Zaharia, Matei Boahen, Kwabena Khattab, Omar Sarup, Saarthak Santhanam, Keshav |
| contents | We study serving retrieval models, specifically late interaction models like ColBERT, to many concurrent users at once and under a small budget, in which the index may not fit in memory. We present ColBERT-serve, a novel serving system that applies a memory-mapping strategy to the ColBERT index, reducing RAM usage by 90% and permitting its deployment on cheap servers, and incorporates a multi-stage architecture with hybrid scoring, reducing ColBERT's query latency and supporting many concurrent queries in parallel. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2504_14903 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | ColBERT-serve: Efficient Multi-Stage Memory-Mapped Scoring Huang, Kaili Venkatesh, Thejas Dingankar, Uma Mallia, Antonio Campos, Daniel Jiao, Jian Potts, Christopher Zaharia, Matei Boahen, Kwabena Khattab, Omar Sarup, Saarthak Santhanam, Keshav Information Retrieval We study serving retrieval models, specifically late interaction models like ColBERT, to many concurrent users at once and under a small budget, in which the index may not fit in memory. We present ColBERT-serve, a novel serving system that applies a memory-mapping strategy to the ColBERT index, reducing RAM usage by 90% and permitting its deployment on cheap servers, and incorporates a multi-stage architecture with hybrid scoring, reducing ColBERT's query latency and supporting many concurrent queries in parallel. |
| title | ColBERT-serve: Efficient Multi-Stage Memory-Mapped Scoring |
| topic | Information Retrieval |
| url | https://arxiv.org/abs/2504.14903 |