Saved in:
| Main Authors: | Liu, Fuliang, Li, Xue, Zhao, Ketai, Gao, Yinxi, Zhou, Ziyan, Zhang, Zhonghui, Wang, Zhibin, Dou, Wanchun, Zhong, Sheng, Tian, Chen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.19278 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding
by: Wang, Zhibin, et al.
Published: (2025)
by: Wang, Zhibin, et al.
Published: (2025)
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters
by: Yi, Euiin, et al.
Published: (2024)
by: Yi, Euiin, et al.
Published: (2024)
Tutorial Proposal: Speculative Decoding for Efficient LLM Inference
by: Xia, Heming, et al.
Published: (2025)
by: Xia, Heming, et al.
Published: (2025)
SSV: Sparse Speculative Verification for Efficient LLM Inference
by: Wang, Zhibin, et al.
Published: (2026)
by: Wang, Zhibin, et al.
Published: (2026)
Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
by: Bhendawade, Nikhil, et al.
Published: (2025)
by: Bhendawade, Nikhil, et al.
Published: (2025)
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
by: Xia, Heming, et al.
Published: (2024)
by: Xia, Heming, et al.
Published: (2024)
S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation
by: Han, Ligong, et al.
Published: (2026)
by: Han, Ligong, et al.
Published: (2026)
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices
by: Svirschevski, Ruslan, et al.
Published: (2024)
by: Svirschevski, Ruslan, et al.
Published: (2024)
Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency
by: Li, Ruixiao, et al.
Published: (2025)
by: Li, Ruixiao, et al.
Published: (2025)
DFlash: Block Diffusion for Flash Speculative Decoding
by: Chen, Jian, et al.
Published: (2026)
by: Chen, Jian, et al.
Published: (2026)
Self Speculative Decoding for Diffusion Large Language Models
by: Gao, Yifeng, et al.
Published: (2025)
by: Gao, Yifeng, et al.
Published: (2025)
Speculative Decoding for Multi-Sample Inference
by: Li, Yiwei, et al.
Published: (2025)
by: Li, Yiwei, et al.
Published: (2025)
SDSAT: Accelerating LLM Inference through Speculative Decoding with Semantic Adaptive Tokens
by: Liu, Chengbo, et al.
Published: (2024)
by: Liu, Chengbo, et al.
Published: (2024)
Speculative Streaming: Fast LLM Inference without Auxiliary Models
by: Bhendawade, Nikhil, et al.
Published: (2024)
by: Bhendawade, Nikhil, et al.
Published: (2024)
TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar
by: Li, Yinxi, et al.
Published: (2025)
by: Li, Yinxi, et al.
Published: (2025)
Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration
by: Wen, Zhuofan, et al.
Published: (2024)
by: Wen, Zhuofan, et al.
Published: (2024)
Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies
by: Timor, Nadav, et al.
Published: (2025)
by: Timor, Nadav, et al.
Published: (2025)
Automatic Task Detection and Heterogeneous LLM Speculative Decoding
by: Ge, Danying, et al.
Published: (2025)
by: Ge, Danying, et al.
Published: (2025)
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
by: Li, Minghan, et al.
Published: (2024)
by: Li, Minghan, et al.
Published: (2024)
RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding
by: Chen, Guanzheng, et al.
Published: (2025)
by: Chen, Guanzheng, et al.
Published: (2025)
Speculative Contrastive Decoding
by: Yuan, Hongyi, et al.
Published: (2023)
by: Yuan, Hongyi, et al.
Published: (2023)
Cacheback: Speculative Decoding With Nothing But Cache
by: Ma, Zhiyao, et al.
Published: (2025)
by: Ma, Zhiyao, et al.
Published: (2025)
Fast Best-of-N Decoding via Speculative Rejection
by: Sun, Hanshi, et al.
Published: (2024)
by: Sun, Hanshi, et al.
Published: (2024)
Cross-Attention Speculative Decoding
by: Zhong, Wei, et al.
Published: (2025)
by: Zhong, Wei, et al.
Published: (2025)
Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference
by: Zhang, Libo, et al.
Published: (2024)
by: Zhang, Libo, et al.
Published: (2024)
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
by: Agrawal, Sudhanshu, et al.
Published: (2025)
by: Agrawal, Sudhanshu, et al.
Published: (2025)
DReSD: Dense Retrieval for Speculative Decoding
by: Gritta, Milan, et al.
Published: (2025)
by: Gritta, Milan, et al.
Published: (2025)
Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism
by: Liu, Jiahao, et al.
Published: (2024)
by: Liu, Jiahao, et al.
Published: (2024)
Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding
by: Sun, Shuoyang, et al.
Published: (2026)
by: Sun, Shuoyang, et al.
Published: (2026)
KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning
by: Zhang, Kaiqi, et al.
Published: (2024)
by: Zhang, Kaiqi, et al.
Published: (2024)
SpecTr-GBV: Multi-Draft Block Verification Accelerating Speculative Decoding
by: Lin, Yijun, et al.
Published: (2026)
by: Lin, Yijun, et al.
Published: (2026)
Clover-2: Accurate Inference for Regressive Lightweight Speculative Decoding
by: Xiao, Bin, et al.
Published: (2024)
by: Xiao, Bin, et al.
Published: (2024)
Fast Large Language Model Collaborative Decoding via Speculation
by: Fu, Jiale, et al.
Published: (2025)
by: Fu, Jiale, et al.
Published: (2025)
Calibrated Speculative Decoding: Frequency-Guided Candidate Selection for Efficient Inference
by: Zhou, Xuwen, et al.
Published: (2026)
by: Zhou, Xuwen, et al.
Published: (2026)
Decoding Speculative Decoding
by: Yan, Minghao, et al.
Published: (2024)
by: Yan, Minghao, et al.
Published: (2024)
DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding
by: Li, Guanghao, et al.
Published: (2025)
by: Li, Guanghao, et al.
Published: (2025)
Speculative Decoding with a Speculative Vocabulary
by: Williams, Miles, et al.
Published: (2026)
by: Williams, Miles, et al.
Published: (2026)
Cost-Aware Diffusion Draft Trees for Speculative Decoding
by: Zhang, Shuai, et al.
Published: (2026)
by: Zhang, Shuai, et al.
Published: (2026)
Accelerating Speculative Decoding with Block Diffusion Draft Trees
by: Ringel, Liran, et al.
Published: (2026)
by: Ringel, Liran, et al.
Published: (2026)
Factorization-Error-Free Discrete Diffusion Language Model via Speculative Decoding
by: Fang, Xun, et al.
Published: (2026)
by: Fang, Xun, et al.
Published: (2026)
Similar Items
-
Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding
by: Wang, Zhibin, et al.
Published: (2025) -
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters
by: Yi, Euiin, et al.
Published: (2024) -
Tutorial Proposal: Speculative Decoding for Efficient LLM Inference
by: Xia, Heming, et al.
Published: (2025) -
SSV: Sparse Speculative Verification for Efficient LLM Inference
by: Wang, Zhibin, et al.
Published: (2026) -
Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
by: Bhendawade, Nikhil, et al.
Published: (2025)