Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.04678 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911982646460416 |
|---|---|
| author | Ho, Sophia Park, Jinsol Wang, Patrick |
| author_facet | Ho, Sophia Park, Jinsol Wang, Patrick |
| contents | We present CREST (Compact Retrieval-Based Speculative Decoding), a redesign of REST that allows it to be effectively "compacted". REST is a drafting technique for speculative decoding based on retrieving exact n-gram matches of the most recent n tokens generated by the target LLM from a datastore. The key idea of CREST is to only store a subset of the smallest and most common n-grams in the datastore with the hope of achieving comparable performance with less storage space. We found that storing a subset of n-grams both reduces storage space and improves performance. CREST matches REST's accepted token length with 10.6-13.5x less storage space and achieves a 16.5-17.1% higher acceptance length than REST using the same storage space on the HumanEval and MT Bench benchmarks. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2408_04678 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding Ho, Sophia Park, Jinsol Wang, Patrick Computation and Language Artificial Intelligence Databases We present CREST (Compact Retrieval-Based Speculative Decoding), a redesign of REST that allows it to be effectively "compacted". REST is a drafting technique for speculative decoding based on retrieving exact n-gram matches of the most recent n tokens generated by the target LLM from a datastore. The key idea of CREST is to only store a subset of the smallest and most common n-grams in the datastore with the hope of achieving comparable performance with less storage space. We found that storing a subset of n-grams both reduces storage space and improves performance. CREST matches REST's accepted token length with 10.6-13.5x less storage space and achieves a 16.5-17.1% higher acceptance length than REST using the same storage space on the HumanEval and MT Bench benchmarks. |
| title | CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding |
| topic | Computation and Language Artificial Intelligence Databases |
| url | https://arxiv.org/abs/2408.04678 |