Saved in:
| Main Author: | |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.04549 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Table of Contents:
- In the String Matching in Labeled Graphs (SMLG) problem, we need to determine whether a pattern string appears on a given labeled graph or a given automaton. Under the Orthogonal Vectors hypothesis, the SMLG problem cannot be solved in subquadratic time [ICALP 2019]. In typical bioinformatics applications, pattern matching algorithms should be both fast and space-efficient, so we need to determine useful classes of graphs on which the SLMG problem can be solved efficiently. In this paper, we improve on a recent result [STACS 2024] that shows how to solve the SMLG problem in linear time on the compressed representation of Wheeler generalized automata, a class of string-labeled automata that extend de Bruijn graphs. More precisely, we show how to remove the assumption that the automata contain no $ ε$-transitions (namely, edges labeled with the empty string), while retaining the same time and space bounds. This is a significant improvement because $ ε$-transitions add considerable expressive power (making it possible to jump to multiple states for free) and capture the complexity of regular expressions (through Thompson's construction for converting a regular expression into an equivalent automaton). We prove that, to enable $ ε$-transitions, we only need to store two additional bitvectors that can be constructed in linear time.