Guardado en:
Detalles Bibliográficos
Autor principal: Garraoui, Omar
Formato: Preprint
Publicado: 2025
Materias:
Acceso en línea:https://arxiv.org/abs/2601.03271
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866911365400100864
author Garraoui, Omar
author_facet Garraoui, Omar
contents In this work, we propose an enhancement to the Boyer-Moore-Horspool algorithm tailored for natural language text. The approach involves preprocessing the search pattern to identify its statistically least frequent character, referred to as the "anchor." During the search, verification is first performed at this high-entropy position, allowing the algorithm to quickly discard non-matching windows. This fail-fast strategy reduces unnecessary comparisons, improving overall efficiency. Our implementation shows that incorporating basic linguistic statistics into classical pattern-matching techniques can boost performance without increasing complexity to the shift heuristics.
format Preprint
id arxiv_https___arxiv_org_abs_2601_03271
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Optimizing Exact String Matching via Statistical Anchoring
Garraoui, Omar
Data Structures and Algorithms
68W32 68P10
In this work, we propose an enhancement to the Boyer-Moore-Horspool algorithm tailored for natural language text. The approach involves preprocessing the search pattern to identify its statistically least frequent character, referred to as the "anchor." During the search, verification is first performed at this high-entropy position, allowing the algorithm to quickly discard non-matching windows. This fail-fast strategy reduces unnecessary comparisons, improving overall efficiency. Our implementation shows that incorporating basic linguistic statistics into classical pattern-matching techniques can boost performance without increasing complexity to the shift heuristics.
title Optimizing Exact String Matching via Statistical Anchoring
topic Data Structures and Algorithms
68W32 68P10
url https://arxiv.org/abs/2601.03271