Enregistré dans:
Détails bibliographiques
Auteurs principaux: Ghazi, Taha El, Starikovskaya, Tatiana
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2509.14898
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866911163172782080
author Ghazi, Taha El
Starikovskaya, Tatiana
author_facet Ghazi, Taha El
Starikovskaya, Tatiana
contents In this work, we study the problem of detecting periodic trends in strings. While detecting exact periodicity has been studied extensively, real-world data is often noisy, where small deviations or mismatches occur between repetitions. This work focuses on a generalized approach to period detection that efficiently handles noise. Given a string $S$ of length $n$, the task is to identify integers $p$ such that the prefix and the suffix of $S$, each of length $n-p+1$, are similar under a given distance measure. Ergün et al. [APPROX-RANDOM 2017] were the first to study this problem in the streaming model under the Hamming distance. In this work, we combine, in a non-trivial way, the Hamming distance sketch of Clifford et al. [SODA 2019] and the structural description of the $k$-mismatch occurrences of a pattern in a text by Charalampopoulos et al. [FOCS 2020] to present a more efficient streaming algorithm for period detection under the Hamming distance. As a corollary, we derive a streaming algorithm for detecting periods of strings which may contain wildcards, a special symbol that match any character of the alphabet. Our algorithm is not only more efficient than that of Ergün et al. [TCS 2020], but it also operates without their assumption that the string must be free of wildcards in its final characters. Additionally, we introduce the first two-pass streaming algorithm for computing periods under the edit distance by leveraging and extending the Bhattacharya-Koucký's grammar decomposition technique [STOC 2023].
format Preprint
id arxiv_https___arxiv_org_abs_2509_14898
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Streaming periodicity with mismatches, wildcards, and edits
Ghazi, Taha El
Starikovskaya, Tatiana
Data Structures and Algorithms
In this work, we study the problem of detecting periodic trends in strings. While detecting exact periodicity has been studied extensively, real-world data is often noisy, where small deviations or mismatches occur between repetitions. This work focuses on a generalized approach to period detection that efficiently handles noise. Given a string $S$ of length $n$, the task is to identify integers $p$ such that the prefix and the suffix of $S$, each of length $n-p+1$, are similar under a given distance measure. Ergün et al. [APPROX-RANDOM 2017] were the first to study this problem in the streaming model under the Hamming distance. In this work, we combine, in a non-trivial way, the Hamming distance sketch of Clifford et al. [SODA 2019] and the structural description of the $k$-mismatch occurrences of a pattern in a text by Charalampopoulos et al. [FOCS 2020] to present a more efficient streaming algorithm for period detection under the Hamming distance. As a corollary, we derive a streaming algorithm for detecting periods of strings which may contain wildcards, a special symbol that match any character of the alphabet. Our algorithm is not only more efficient than that of Ergün et al. [TCS 2020], but it also operates without their assumption that the string must be free of wildcards in its final characters. Additionally, we introduce the first two-pass streaming algorithm for computing periods under the edit distance by leveraging and extending the Bhattacharya-Koucký's grammar decomposition technique [STOC 2023].
title Streaming periodicity with mismatches, wildcards, and edits
topic Data Structures and Algorithms
url https://arxiv.org/abs/2509.14898