Saved in:
| Main Authors: | Neelam, Sanjit, Heinlein, Daniel, Cvicek, Vaclav, Mishra, Akshay, Pope, Reiner |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.06419 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies
by: Timor, Nadav, et al.
Published: (2025)
by: Timor, Nadav, et al.
Published: (2025)
Out-of-Vocabulary Sampling Boosts Speculative Decoding
by: Timor, Nadav, et al.
Published: (2025)
by: Timor, Nadav, et al.
Published: (2025)
SpecPipe: Accelerating Pipeline Parallelism-based LLM Inference with Speculative Decoding
by: Yin, Haofei, et al.
Published: (2025)
by: Yin, Haofei, et al.
Published: (2025)
Fast Inference via Hierarchical Speculative Decoding
by: Mohri, Clara, et al.
Published: (2025)
by: Mohri, Clara, et al.
Published: (2025)
Speculative Speculative Decoding
by: Kumar, Tanishq, et al.
Published: (2026)
by: Kumar, Tanishq, et al.
Published: (2026)
Speculative Decoding Scaling Laws (SDSL): Throughput Optimization Made Simple
by: Bozorgkhoo, Amirhossein, et al.
Published: (2026)
by: Bozorgkhoo, Amirhossein, et al.
Published: (2026)
Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration
by: Wen, Zhuofan, et al.
Published: (2024)
by: Wen, Zhuofan, et al.
Published: (2024)
Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement
by: Jeon, Wonseok, et al.
Published: (2024)
by: Jeon, Wonseok, et al.
Published: (2024)
Lever: Speculative LLM Inference on Smartphones
by: Wang, Tuowei, et al.
Published: (2026)
by: Wang, Tuowei, et al.
Published: (2026)
Energy-Efficient Wireless LLM Inference via Uncertainty and Importance-Aware Speculative Decoding
by: Park, Jihoon, et al.
Published: (2025)
by: Park, Jihoon, et al.
Published: (2025)
Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency
by: Li, Ruixiao, et al.
Published: (2025)
by: Li, Ruixiao, et al.
Published: (2025)
Hydragen: High-Throughput LLM Inference with Shared Prefixes
by: Juravsky, Jordan, et al.
Published: (2024)
by: Juravsky, Jordan, et al.
Published: (2024)
Decoding Speculative Decoding
by: Yan, Minghao, et al.
Published: (2024)
by: Yan, Minghao, et al.
Published: (2024)
An Interpretable Latency Model for Speculative Decoding in LLM Serving
by: Kong, Linghao, et al.
Published: (2026)
by: Kong, Linghao, et al.
Published: (2026)
Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation
by: Gui, Lujun, et al.
Published: (2024)
by: Gui, Lujun, et al.
Published: (2024)
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
by: Huang, Kaixuan, et al.
Published: (2024)
by: Huang, Kaixuan, et al.
Published: (2024)
CARD: A Cache-Assisted Parallel Speculative Decoding Framework via Query-and-Correct Paradigm for Accelerating LLM Inference
by: Zhou, Enyu, et al.
Published: (2025)
by: Zhou, Enyu, et al.
Published: (2025)
Collaborative Speculative Inference for Efficient LLM Inference Serving
by: Gao, Luyao, et al.
Published: (2025)
by: Gao, Luyao, et al.
Published: (2025)
Calibrated Speculative Decoding: Frequency-Guided Candidate Selection for Efficient Inference
by: Zhou, Xuwen, et al.
Published: (2026)
by: Zhou, Xuwen, et al.
Published: (2026)
Clover-2: Accurate Inference for Regressive Lightweight Speculative Decoding
by: Xiao, Bin, et al.
Published: (2024)
by: Xiao, Bin, et al.
Published: (2024)
Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE
by: Huang, Haiduo, et al.
Published: (2025)
by: Huang, Haiduo, et al.
Published: (2025)
Cascade Speculative Drafting for Even Faster LLM Inference
by: Chen, Ziyi, et al.
Published: (2023)
by: Chen, Ziyi, et al.
Published: (2023)
FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference
by: Bajpai, Divya Jyoti, et al.
Published: (2025)
by: Bajpai, Divya Jyoti, et al.
Published: (2025)
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
by: Zhao, Yilong, et al.
Published: (2025)
by: Zhao, Yilong, et al.
Published: (2025)
Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding
by: Ryu, Hyun, et al.
Published: (2024)
by: Ryu, Hyun, et al.
Published: (2024)
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
by: Elhoushi, Mostafa, et al.
Published: (2024)
by: Elhoushi, Mostafa, et al.
Published: (2024)
Online Speculative Decoding
by: Liu, Xiaoxuan, et al.
Published: (2023)
by: Liu, Xiaoxuan, et al.
Published: (2023)
Speculative Safety-Aware Decoding
by: Wang, Xuekang, et al.
Published: (2025)
by: Wang, Xuekang, et al.
Published: (2025)
Speculative Decoding Across Languages
by: Paudel, Nirajan, et al.
Published: (2026)
by: Paudel, Nirajan, et al.
Published: (2026)
SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding
by: Zhang, Ziyi, et al.
Published: (2025)
by: Zhang, Ziyi, et al.
Published: (2025)
Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents
by: Dai, J. G., et al.
Published: (2025)
by: Dai, J. G., et al.
Published: (2025)
Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding
by: Koh, Jungyeon, et al.
Published: (2025)
by: Koh, Jungyeon, et al.
Published: (2025)
LLM-42: Enabling Determinism in LLM Inference with Verified Speculation
by: Gond, Raja, et al.
Published: (2026)
by: Gond, Raja, et al.
Published: (2026)
Principled Coarse-Grained Acceptance for Speculative Decoding in Speech
by: Yanuka, Moran, et al.
Published: (2025)
by: Yanuka, Moran, et al.
Published: (2025)
Compiler-Assisted Speculative Sampling for Accelerated LLM Inference on Heterogeneous Edge Devices
by: Mesa, Alejandro Ruiz y, et al.
Published: (2026)
by: Mesa, Alejandro Ruiz y, et al.
Published: (2026)
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
by: Agrawal, Sudhanshu, et al.
Published: (2025)
by: Agrawal, Sudhanshu, et al.
Published: (2025)
AMUSD: Asynchronous Multi-Device Speculative Decoding for LLM Acceleration
by: McDanel, Bradley
Published: (2024)
by: McDanel, Bradley
Published: (2024)
Mixture of Attentions For Speculative Decoding
by: Zimmer, Matthieu, et al.
Published: (2024)
by: Zimmer, Matthieu, et al.
Published: (2024)
Scaling Speculative Decoding with Lookahead Reasoning
by: Fu, Yichao, et al.
Published: (2025)
by: Fu, Yichao, et al.
Published: (2025)
CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs
by: Ning, Zhiyuan, et al.
Published: (2025)
by: Ning, Zhiyuan, et al.
Published: (2025)
Similar Items
-
Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies
by: Timor, Nadav, et al.
Published: (2025) -
Out-of-Vocabulary Sampling Boosts Speculative Decoding
by: Timor, Nadav, et al.
Published: (2025) -
SpecPipe: Accelerating Pipeline Parallelism-based LLM Inference with Speculative Decoding
by: Yin, Haofei, et al.
Published: (2025) -
Fast Inference via Hierarchical Speculative Decoding
by: Mohri, Clara, et al.
Published: (2025) -
Speculative Speculative Decoding
by: Kumar, Tanishq, et al.
Published: (2026)