Saved in:
Bibliographic Details
Main Author: Lù, Xing Han
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.03618
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913416457748480
author Lù, Xing Han
author_facet Lù, Xing Han
contents We introduce BM25S, an efficient Python-based implementation of BM25 that only depends on Numpy and Scipy. BM25S achieves up to a 500x speedup compared to the most popular Python-based framework by eagerly computing BM25 scores during indexing and storing them into sparse matrices. It also achieves considerable speedups compared to highly optimized Java-based implementations, which are used by popular commercial products. Finally, BM25S reproduces the exact implementation of five BM25 variants based on Kamphuis et al. (2020) by extending eager scoring to non-sparse variants using a novel score shifting method. The code can be found at https://github.com/xhluca/bm25s
format Preprint
id arxiv_https___arxiv_org_abs_2407_03618
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle BM25S: Orders of magnitude faster lexical search via eager sparse scoring
Lù, Xing Han
Information Retrieval
Computation and Language
We introduce BM25S, an efficient Python-based implementation of BM25 that only depends on Numpy and Scipy. BM25S achieves up to a 500x speedup compared to the most popular Python-based framework by eagerly computing BM25 scores during indexing and storing them into sparse matrices. It also achieves considerable speedups compared to highly optimized Java-based implementations, which are used by popular commercial products. Finally, BM25S reproduces the exact implementation of five BM25 variants based on Kamphuis et al. (2020) by extending eager scoring to non-sparse variants using a novel score shifting method. The code can be found at https://github.com/xhluca/bm25s
title BM25S: Orders of magnitude faster lexical search via eager sparse scoring
topic Information Retrieval
Computation and Language
url https://arxiv.org/abs/2407.03618