Salvato in:
Dettagli Bibliografici
Autori principali: Huang, Kaili, Venkatesh, Thejas, Dingankar, Uma, Mallia, Antonio, Campos, Daniel, Jiao, Jian, Potts, Christopher, Zaharia, Matei, Boahen, Kwabena, Khattab, Omar, Sarup, Saarthak, Santhanam, Keshav
Natura: Preprint
Pubblicazione: 2025
Soggetti:
Accesso online:https://arxiv.org/abs/2504.14903
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
Sommario:
  • We study serving retrieval models, specifically late interaction models like ColBERT, to many concurrent users at once and under a small budget, in which the index may not fit in memory. We present ColBERT-serve, a novel serving system that applies a memory-mapping strategy to the ColBERT index, reducing RAM usage by 90% and permitting its deployment on cheap servers, and incorporates a multi-stage architecture with hybrid scoring, reducing ColBERT's query latency and supporting many concurrent queries in parallel.