Saved in:
Bibliographic Details
Main Authors: Iakovlev, Zakhar, Chulkov, Alexey, Golikov, Nikita, Lukianov, Vyacheslav, Zinoviev, Nikita, Ivanov, Dmitry, Aksenov, Vitaly
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2403.03751
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • One common way to speed up the find operation within a set of text files involves a trigram index. This structure is merely a map from a trigram (sequence consisting of three characters) to a set of files which contain it. When searching for a pattern, potential file locations are identified by intersecting the sets related to the trigrams in the pattern. Then, the search proceeds only in these files. However, in a code repository, the trigram index evolves across different versions. Upon checking out a new version, this index is typically built from scratch, which is a time-consuming task, while we want our index to have almost zero-time startup. Thus, we explore the persistent version of a trigram index for full-text and key word patterns search. Our approach just uses the current version of the trigram index and applies only the changes between versions during checkout, significantly enhancing performance. Furthermore, we extend our data structure to accommodate CamelHump search for class and function names.