Saved in:
Bibliographic Details
Main Author: Marchet, Camille
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2409.05210
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • This paper provides a comprehensive survey of data structures for representing k-mer sets, which are fundamental in high-throughput sequencing analysis. It categorizes the methods into two main strategies: those using fingerprinting and hashing for compact storage, and those leveraging lexicographic properties for efficient representation. The paper reviews key operations supported by these structures, such as membership queries and dynamic updates, and highlights recent advancements in memory efficiency and query speed. A companion paper explores colored k-mer sets, which extend these concepts to integrate multiple datasets or genomes.