Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2307.08833 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866913510768771072 |
|---|---|
| author | De, Rajat Kempa, Dominik |
| author_facet | De, Rajat Kempa, Dominik |
| contents | Grammar compression is a general compression framework in which a string $T$ of length $N$ is represented as a context-free grammar of size $n$ whose language contains only $T$. In this paper, we focus on studying the limitations of algorithms and data structures operating on strings in grammar-compressed form. Previous work focused on proving lower bounds for grammars constructed using algorithms that achieve the approximation ratio $ρ=\mathcal{O}(\text{polylog }N)$. Unfortunately, for the majority of grammar compressors, $ρ$ is either unknown or satisfies $ρ=ω(\text{polylog }N)$. In their seminal paper, Charikar et al. [IEEE Trans. Inf. Theory 2005] studied seven popular grammar compression algorithms: RePair, Greedy, LongestMatch, Sequential, Bisection, LZ78, and $α$-Balanced. Only one of them ($α$-Balanced) is known to achieve $ρ=\mathcal{O}(\text{polylog }N)$.
We develop the first technique for proving lower bounds for data structures and algorithms on grammars that is fully general and does not depend on the approximation ratio $ρ$ of the used grammar compressor. Using this technique, we first prove that $Ω(\log N/\log \log N)$ time is required for random access on RePair, Greedy, LongestMatch, Sequential, and Bisection, while $Ω(\log\log N)$ time is required for random access to LZ78. All these lower bounds hold within space $\mathcal{O}(n\text{ polylog }N)$ and match the existing upper bounds. We also generalize this technique to prove several conditional lower bounds for compressed computation. For example, we prove that unless the Combinatorial $k$-Clique Conjecture fails, there is no combinatorial algorithm for CFG parsing on Bisection (for which it holds $ρ=\tildeΘ(N^{1/2})$) that runs in $\mathcal{O}(n^c\cdot N^{3-ε})$ time for all constants $c>0$ and $ε>0$. Previously, this was known only for $c<2ε$. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2307_08833 |
| institution | arXiv |
| publishDate | 2023 |
| record_format | arxiv |
| spellingShingle | Grammar Boosting: A New Technique for Proving Lower Bounds for Computation over Compressed Data De, Rajat Kempa, Dominik Data Structures and Algorithms Grammar compression is a general compression framework in which a string $T$ of length $N$ is represented as a context-free grammar of size $n$ whose language contains only $T$. In this paper, we focus on studying the limitations of algorithms and data structures operating on strings in grammar-compressed form. Previous work focused on proving lower bounds for grammars constructed using algorithms that achieve the approximation ratio $ρ=\mathcal{O}(\text{polylog }N)$. Unfortunately, for the majority of grammar compressors, $ρ$ is either unknown or satisfies $ρ=ω(\text{polylog }N)$. In their seminal paper, Charikar et al. [IEEE Trans. Inf. Theory 2005] studied seven popular grammar compression algorithms: RePair, Greedy, LongestMatch, Sequential, Bisection, LZ78, and $α$-Balanced. Only one of them ($α$-Balanced) is known to achieve $ρ=\mathcal{O}(\text{polylog }N)$. We develop the first technique for proving lower bounds for data structures and algorithms on grammars that is fully general and does not depend on the approximation ratio $ρ$ of the used grammar compressor. Using this technique, we first prove that $Ω(\log N/\log \log N)$ time is required for random access on RePair, Greedy, LongestMatch, Sequential, and Bisection, while $Ω(\log\log N)$ time is required for random access to LZ78. All these lower bounds hold within space $\mathcal{O}(n\text{ polylog }N)$ and match the existing upper bounds. We also generalize this technique to prove several conditional lower bounds for compressed computation. For example, we prove that unless the Combinatorial $k$-Clique Conjecture fails, there is no combinatorial algorithm for CFG parsing on Bisection (for which it holds $ρ=\tildeΘ(N^{1/2})$) that runs in $\mathcal{O}(n^c\cdot N^{3-ε})$ time for all constants $c>0$ and $ε>0$. Previously, this was known only for $c<2ε$. |
| title | Grammar Boosting: A New Technique for Proving Lower Bounds for Computation over Compressed Data |
| topic | Data Structures and Algorithms |
| url | https://arxiv.org/abs/2307.08833 |