Saved in:
| Main Authors: | Sun, Hanshi, Haider, Momin, Zhang, Ruiqi, Yang, Huitao, Qiu, Jiahao, Yin, Ming, Wang, Mengdi, Bartlett, Peter, Zanette, Andrea |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.20290 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Theoretical Perspective for Speculative Decoding Algorithm
by: Yin, Ming, et al.
Published: (2024)
by: Yin, Ming, et al.
Published: (2024)
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
by: Sun, Hanshi, et al.
Published: (2024)
by: Sun, Hanshi, et al.
Published: (2024)
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
by: Qiu, Jiahao, et al.
Published: (2024)
by: Qiu, Jiahao, et al.
Published: (2024)
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
by: Huang, Kaixuan, et al.
Published: (2024)
by: Huang, Kaixuan, et al.
Published: (2024)
Reject Only Critical Tokens: Pivot-Aware Speculative Decoding
by: Ziashahabi, Amir, et al.
Published: (2025)
by: Ziashahabi, Amir, et al.
Published: (2025)
Efficient Adaptive Rejection Sampling for Accelerating Speculative Decoding in Large Language Models
by: Sun, Chendong, et al.
Published: (2025)
by: Sun, Chendong, et al.
Published: (2025)
Fast Large Language Model Collaborative Decoding via Speculation
by: Fu, Jiale, et al.
Published: (2025)
by: Fu, Jiale, et al.
Published: (2025)
Graph-Structured Speculative Decoding
by: Gong, Zhuocheng, et al.
Published: (2024)
by: Gong, Zhuocheng, et al.
Published: (2024)
SpecTr: Fast Speculative Decoding via Optimal Transport
by: Sun, Ziteng, et al.
Published: (2023)
by: Sun, Ziteng, et al.
Published: (2023)
In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization
by: Zhang, Ruiqi, et al.
Published: (2024)
by: Zhang, Ruiqi, et al.
Published: (2024)
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
by: Cheng, Yunfei, et al.
Published: (2024)
by: Cheng, Yunfei, et al.
Published: (2024)
Cost-Aware Diffusion Draft Trees for Speculative Decoding
by: Zhang, Shuai, et al.
Published: (2026)
by: Zhang, Shuai, et al.
Published: (2026)
Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism
by: Liu, Jiahao, et al.
Published: (2024)
by: Liu, Jiahao, et al.
Published: (2024)
DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference
by: Liu, Fuliang, et al.
Published: (2026)
by: Liu, Fuliang, et al.
Published: (2026)
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
by: Liu, Tianyu, et al.
Published: (2025)
by: Liu, Tianyu, et al.
Published: (2025)
Online Speculative Decoding
by: Liu, Xiaoxuan, et al.
Published: (2023)
by: Liu, Xiaoxuan, et al.
Published: (2023)
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
by: Zhou, Zhaoyi, et al.
Published: (2025)
by: Zhou, Zhaoyi, et al.
Published: (2025)
Training Domain Draft Models for Speculative Decoding: Best Practices and Insights
by: Hong, Fenglu, et al.
Published: (2025)
by: Hong, Fenglu, et al.
Published: (2025)
Training Language Models to Reason Efficiently
by: Arora, Daman, et al.
Published: (2025)
by: Arora, Daman, et al.
Published: (2025)
SAM Decoding: Speculative Decoding via Suffix Automaton
by: Hu, Yuxuan, et al.
Published: (2024)
by: Hu, Yuxuan, et al.
Published: (2024)
Beyond the Target: From Imitation to Collaboration in Speculative Decoding
by: Li, Jinze, et al.
Published: (2026)
by: Li, Jinze, et al.
Published: (2026)
Decoding Speculative Decoding
by: Yan, Minghao, et al.
Published: (2024)
by: Yan, Minghao, et al.
Published: (2024)
Speculative Decoding with a Speculative Vocabulary
by: Williams, Miles, et al.
Published: (2026)
by: Williams, Miles, et al.
Published: (2026)
Multi-Candidate Speculative Decoding
by: Yang, Sen, et al.
Published: (2024)
by: Yang, Sen, et al.
Published: (2024)
Dynamic Depth Decoding: Faster Speculative Decoding for LLMs
by: Brown, Oscar, et al.
Published: (2024)
by: Brown, Oscar, et al.
Published: (2024)
Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement
by: Zhang, Ruiqi, et al.
Published: (2024)
by: Zhang, Ruiqi, et al.
Published: (2024)
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting
by: Lv, Kai, et al.
Published: (2025)
by: Lv, Kai, et al.
Published: (2025)
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters
by: Yi, Euiin, et al.
Published: (2024)
by: Yi, Euiin, et al.
Published: (2024)
S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation
by: Han, Ligong, et al.
Published: (2026)
by: Han, Ligong, et al.
Published: (2026)
Speculative Contrastive Decoding
by: Yuan, Hongyi, et al.
Published: (2023)
by: Yuan, Hongyi, et al.
Published: (2023)
Traversal Verification for Speculative Tree Decoding
by: Weng, Yepeng, et al.
Published: (2025)
by: Weng, Yepeng, et al.
Published: (2025)
Accelerate Speculative Decoding with Sparse Computation in Verification
by: Wang, Jikai, et al.
Published: (2025)
by: Wang, Jikai, et al.
Published: (2025)
Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE
by: Huang, Haiduo, et al.
Published: (2025)
by: Huang, Haiduo, et al.
Published: (2025)
Faster Cascades via Speculative Decoding
by: Narasimhan, Harikrishna, et al.
Published: (2024)
by: Narasimhan, Harikrishna, et al.
Published: (2024)
Scaling Laws for Speculative Decoding
by: Yan, Siyuan, et al.
Published: (2025)
by: Yan, Siyuan, et al.
Published: (2025)
Speculate Deep and Accurate: Lossless and Training-Free Acceleration for Offloaded LLMs via Substitute Speculative Decoding
by: Wang, Pei-Shuo, et al.
Published: (2025)
by: Wang, Pei-Shuo, et al.
Published: (2025)
Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism
by: Yu, Yijiong, et al.
Published: (2026)
by: Yu, Yijiong, et al.
Published: (2026)
Temperature-Centric Investigation of Speculative Decoding with Knowledge Distillation
by: Ouyang, Siru, et al.
Published: (2024)
by: Ouyang, Siru, et al.
Published: (2024)
Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
by: Seo, Yeongbin, et al.
Published: (2025)
by: Seo, Yeongbin, et al.
Published: (2025)
TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees
by: Liu, Tianyu, et al.
Published: (2026)
by: Liu, Tianyu, et al.
Published: (2026)
Similar Items
-
A Theoretical Perspective for Speculative Decoding Algorithm
by: Yin, Ming, et al.
Published: (2024) -
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
by: Sun, Hanshi, et al.
Published: (2024) -
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
by: Qiu, Jiahao, et al.
Published: (2024) -
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
by: Huang, Kaixuan, et al.
Published: (2024) -
Reject Only Critical Tokens: Pivot-Aware Speculative Decoding
by: Ziashahabi, Amir, et al.
Published: (2025)