Saved in:
| Main Authors: | Chen, Ziyi, Yang, Xiaocong, Lin, Jiacheng, Sun, Chenkai, Chang, Kevin Chen-Chuan, Huang, Jie |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2312.11462 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration
by: Wen, Zhuofan, et al.
Published: (2024)
by: Wen, Zhuofan, et al.
Published: (2024)
$\texttt{SPECS}$: Faster Test-Time Scaling through Speculative Drafts
by: Cemri, Mert, et al.
Published: (2025)
by: Cemri, Mert, et al.
Published: (2025)
Faster Cascades via Speculative Decoding
by: Narasimhan, Harikrishna, et al.
Published: (2024)
by: Narasimhan, Harikrishna, et al.
Published: (2024)
Balancing Coverage and Draft Latency in Vocabulary Trimming for Faster Speculative Decoding
by: Shoham, Ofir Ben
Published: (2026)
by: Shoham, Ofir Ben
Published: (2026)
PRISM: Parametrically Refactoring Inference for Speculative Sampling Draft Models
by: Wang, Xuliang, et al.
Published: (2026)
by: Wang, Xuliang, et al.
Published: (2026)
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
by: Li, Yuhui, et al.
Published: (2024)
by: Li, Yuhui, et al.
Published: (2024)
OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding
by: Ramakrishnan, Ramchalam Kinattinkara, et al.
Published: (2025)
by: Ramakrishnan, Ramchalam Kinattinkara, et al.
Published: (2025)
POSS: Position Specialist Generates Better Draft for Speculative Decoding
by: Huang, Langlin, et al.
Published: (2025)
by: Huang, Langlin, et al.
Published: (2025)
TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs
by: Lee, Minjae, et al.
Published: (2026)
by: Lee, Minjae, et al.
Published: (2026)
Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference
by: Timor, Nadav, et al.
Published: (2024)
by: Timor, Nadav, et al.
Published: (2024)
Faster LLM Inference via Sequential Monte Carlo
by: Emara, Yahya, et al.
Published: (2026)
by: Emara, Yahya, et al.
Published: (2026)
Faster MoE LLM Inference for Extremely Large Models
by: Yang, Haoqi, et al.
Published: (2025)
by: Yang, Haoqi, et al.
Published: (2025)
SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding
by: Plaksin, Anton, et al.
Published: (2026)
by: Plaksin, Anton, et al.
Published: (2026)
Make Every Draft Count: Hidden State based Speculative Decoding
by: Chen, Yuetao, et al.
Published: (2026)
by: Chen, Yuetao, et al.
Published: (2026)
Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency
by: Li, Ruixiao, et al.
Published: (2025)
by: Li, Ruixiao, et al.
Published: (2025)
LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
by: Yang, Penghui, et al.
Published: (2025)
by: Yang, Penghui, et al.
Published: (2025)
ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts
by: Georganas, Evangelos, et al.
Published: (2025)
by: Georganas, Evangelos, et al.
Published: (2025)
Training Domain Draft Models for Speculative Decoding: Best Practices and Insights
by: Hong, Fenglu, et al.
Published: (2025)
by: Hong, Fenglu, et al.
Published: (2025)
Self-Speculative Biased Decoding for Faster Re-Translation
by: Zeng, Linxiao, et al.
Published: (2025)
by: Zeng, Linxiao, et al.
Published: (2025)
Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs
by: Goel, Raghavv, et al.
Published: (2024)
by: Goel, Raghavv, et al.
Published: (2024)
MineDraft: A Framework for Batch Parallel Speculative Decoding
by: Tang, Zhenwei, et al.
Published: (2026)
by: Tang, Zhenwei, et al.
Published: (2026)
Multi-Draft Speculative Sampling: Canonical Decomposition and Theoretical Limits
by: Khisti, Ashish, et al.
Published: (2024)
by: Khisti, Ashish, et al.
Published: (2024)
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
by: Bachmann, Gregor, et al.
Published: (2025)
by: Bachmann, Gregor, et al.
Published: (2025)
Speculative Streaming: Fast LLM Inference without Auxiliary Models
by: Bhendawade, Nikhil, et al.
Published: (2024)
by: Bhendawade, Nikhil, et al.
Published: (2024)
Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding
by: Sun, Shuoyang, et al.
Published: (2026)
by: Sun, Shuoyang, et al.
Published: (2026)
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
by: Sun, Hanshi, et al.
Published: (2024)
by: Sun, Hanshi, et al.
Published: (2024)
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
by: Qiu, Jiahao, et al.
Published: (2024)
by: Qiu, Jiahao, et al.
Published: (2024)
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding
by: Zhao, Weilin, et al.
Published: (2024)
by: Zhao, Weilin, et al.
Published: (2024)
FlashDecoding++: Faster Large Language Model Inference on GPUs
by: Hong, Ke, et al.
Published: (2023)
by: Hong, Ke, et al.
Published: (2023)
Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies
by: Timor, Nadav, et al.
Published: (2025)
by: Timor, Nadav, et al.
Published: (2025)
FastEagle: Cascaded Drafting for Accelerating Speculative Decoding
by: Huang, Haiduo, et al.
Published: (2025)
by: Huang, Haiduo, et al.
Published: (2025)
Clover-2: Accurate Inference for Regressive Lightweight Speculative Decoding
by: Xiao, Bin, et al.
Published: (2024)
by: Xiao, Bin, et al.
Published: (2024)
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
by: Butler, Branden, et al.
Published: (2024)
by: Butler, Branden, et al.
Published: (2024)
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models
by: Huang, Jie, et al.
Published: (2023)
by: Huang, Jie, et al.
Published: (2023)
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
by: Pouransari, Hadi, et al.
Published: (2024)
by: Pouransari, Hadi, et al.
Published: (2024)
AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability
by: Agrawal, Sudhanshu, et al.
Published: (2024)
by: Agrawal, Sudhanshu, et al.
Published: (2024)
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
by: Tseng, Albert, et al.
Published: (2024)
by: Tseng, Albert, et al.
Published: (2024)
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
by: Sun, Hanshi, et al.
Published: (2024)
by: Sun, Hanshi, et al.
Published: (2024)
A Theoretical Perspective for Speculative Decoding Algorithm
by: Yin, Ming, et al.
Published: (2024)
by: Yin, Ming, et al.
Published: (2024)
Calibrated Speculative Decoding: Frequency-Guided Candidate Selection for Efficient Inference
by: Zhou, Xuwen, et al.
Published: (2026)
by: Zhou, Xuwen, et al.
Published: (2026)
Similar Items
-
Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration
by: Wen, Zhuofan, et al.
Published: (2024) -
$\texttt{SPECS}$: Faster Test-Time Scaling through Speculative Drafts
by: Cemri, Mert, et al.
Published: (2025) -
Faster Cascades via Speculative Decoding
by: Narasimhan, Harikrishna, et al.
Published: (2024) -
Balancing Coverage and Draft Latency in Vocabulary Trimming for Faster Speculative Decoding
by: Shoham, Ofir Ben
Published: (2026) -
PRISM: Parametrically Refactoring Inference for Speculative Sampling Draft Models
by: Wang, Xuliang, et al.
Published: (2026)