Saved in:
Bibliographic Details
Main Authors: Ho, Sophia, Park, Jinsol, Wang, Patrick
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.04678
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911982646460416
author Ho, Sophia
Park, Jinsol
Wang, Patrick
author_facet Ho, Sophia
Park, Jinsol
Wang, Patrick
contents We present CREST (Compact Retrieval-Based Speculative Decoding), a redesign of REST that allows it to be effectively "compacted". REST is a drafting technique for speculative decoding based on retrieving exact n-gram matches of the most recent n tokens generated by the target LLM from a datastore. The key idea of CREST is to only store a subset of the smallest and most common n-grams in the datastore with the hope of achieving comparable performance with less storage space. We found that storing a subset of n-grams both reduces storage space and improves performance. CREST matches REST's accepted token length with 10.6-13.5x less storage space and achieves a 16.5-17.1% higher acceptance length than REST using the same storage space on the HumanEval and MT Bench benchmarks.
format Preprint
id arxiv_https___arxiv_org_abs_2408_04678
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding
Ho, Sophia
Park, Jinsol
Wang, Patrick
Computation and Language
Artificial Intelligence
Databases
We present CREST (Compact Retrieval-Based Speculative Decoding), a redesign of REST that allows it to be effectively "compacted". REST is a drafting technique for speculative decoding based on retrieving exact n-gram matches of the most recent n tokens generated by the target LLM from a datastore. The key idea of CREST is to only store a subset of the smallest and most common n-grams in the datastore with the hope of achieving comparable performance with less storage space. We found that storing a subset of n-grams both reduces storage space and improves performance. CREST matches REST's accepted token length with 10.6-13.5x less storage space and achieves a 16.5-17.1% higher acceptance length than REST using the same storage space on the HumanEval and MT Bench benchmarks.
title CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding
topic Computation and Language
Artificial Intelligence
Databases
url https://arxiv.org/abs/2408.04678