Saved in:
Bibliographic Details
Main Authors: Keiser, John, Lemire, Daniel
Format: Preprint
Published: 2020
Subjects:
Online Access:https://arxiv.org/abs/2010.03090
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • The majority of text is stored in UTF-8, which must be validated on ingestion. We present the lookup algorithm, which outperforms UTF-8 validation routines used in many libraries and languages by more than 10 times using commonly available SIMD instructions. To ensure reproducibility, our work is freely available as open source software.