Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.12639 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866907783080706048 |
|---|---|
| author | Fujinami, Hiroya Hasuo, Ichiro |
| author_facet | Fujinami, Hiroya Hasuo, Ichiro |
| contents | Regular expression (regex) matching is fundamental in many applications, especially in web services. However, matching by backtracking -- preferred by most real-world implementations for its practical performance and backward compatibility -- can suffer from so-called catastrophic backtracking, which makes the number of backtracking super-linear and leads to the well-known ReDoS vulnerability. Inspired by a recent algorithm by Davis et al. that runs in linear time for (non-extended) regexes, we study efficient backtracking matching for regexes with two common extensions, namely look-around and atomic grouping. We present linear-time backtracking matching algorithms for these extended regexes. Their efficiency relies on memoization, much like the one by Davis et al.; we also strive for smaller memoization tables by carefully trimming their range. Our experiments -- we used some real-world regexes with the aforementioned extensions -- confirm the performance advantage of our algorithms. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2401_12639 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Efficient Matching with Memoization for Regexes with Look-around and Atomic Grouping (Extended Version) Fujinami, Hiroya Hasuo, Ichiro Programming Languages Regular expression (regex) matching is fundamental in many applications, especially in web services. However, matching by backtracking -- preferred by most real-world implementations for its practical performance and backward compatibility -- can suffer from so-called catastrophic backtracking, which makes the number of backtracking super-linear and leads to the well-known ReDoS vulnerability. Inspired by a recent algorithm by Davis et al. that runs in linear time for (non-extended) regexes, we study efficient backtracking matching for regexes with two common extensions, namely look-around and atomic grouping. We present linear-time backtracking matching algorithms for these extended regexes. Their efficiency relies on memoization, much like the one by Davis et al.; we also strive for smaller memoization tables by carefully trimming their range. Our experiments -- we used some real-world regexes with the aforementioned extensions -- confirm the performance advantage of our algorithms. |
| title | Efficient Matching with Memoization for Regexes with Look-around and Atomic Grouping (Extended Version) |
| topic | Programming Languages |
| url | https://arxiv.org/abs/2401.12639 |