Guardado en:
Detalles Bibliográficos
Autores principales: Zhou, Yuxuan, Huang, Fei, Li, Heng, Wu, Fengyi, Wang, Tianyu, Zhang, Jianwei, Lin, Junyang, Cheng, Zhi-Qi
Formato: Preprint
Publicado: 2026
Materias:
Acceso en línea:https://arxiv.org/abs/2601.05724
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866915826856099840
author Zhou, Yuxuan
Huang, Fei
Li, Heng
Wu, Fengyi
Wang, Tianyu
Zhang, Jianwei
Lin, Junyang
Cheng, Zhi-Qi
author_facet Zhou, Yuxuan
Huang, Fei
Li, Heng
Wu, Fengyi
Wang, Tianyu
Zhang, Jianwei
Lin, Junyang
Cheng, Zhi-Qi
contents Verification is a key bottleneck in improving inference speed while maintaining distribution fidelity in Speculative Decoding. Recent work has shown that sequence-level verification leads to a higher number of accepted tokens compared to token-wise verification. However, existing solutions often rely on surrogate approximations or are constrained by partial information, struggling with joint intractability. In this work, we propose Hierarchical Speculative Decoding (HSD), a provably lossless verification method that significantly boosts the expected number of accepted tokens and overcomes joint intractability by balancing excess and deficient probability mass across accessible branches. Our extensive large-scale experiments demonstrate that HSD yields consistent improvements in acceptance rates across diverse model families and benchmarks. Moreover, its strong explainability and generality make it readily integrable into a wide range of speculative decoding frameworks. Notably, integrating HSD into EAGLE-3 yields over a 12% performance gain, establishing state-of-the-art decoding efficiency without compromising distribution fidelity. Code is available at https://github.com/ZhouYuxuanYX/Hierarchical-Speculative-Decoding.
format Preprint
id arxiv_https___arxiv_org_abs_2601_05724
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding
Zhou, Yuxuan
Huang, Fei
Li, Heng
Wu, Fengyi
Wang, Tianyu
Zhang, Jianwei
Lin, Junyang
Cheng, Zhi-Qi
Artificial Intelligence
Verification is a key bottleneck in improving inference speed while maintaining distribution fidelity in Speculative Decoding. Recent work has shown that sequence-level verification leads to a higher number of accepted tokens compared to token-wise verification. However, existing solutions often rely on surrogate approximations or are constrained by partial information, struggling with joint intractability. In this work, we propose Hierarchical Speculative Decoding (HSD), a provably lossless verification method that significantly boosts the expected number of accepted tokens and overcomes joint intractability by balancing excess and deficient probability mass across accessible branches. Our extensive large-scale experiments demonstrate that HSD yields consistent improvements in acceptance rates across diverse model families and benchmarks. Moreover, its strong explainability and generality make it readily integrable into a wide range of speculative decoding frameworks. Notably, integrating HSD into EAGLE-3 yields over a 12% performance gain, establishing state-of-the-art decoding efficiency without compromising distribution fidelity. Code is available at https://github.com/ZhouYuxuanYX/Hierarchical-Speculative-Decoding.
title Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding
topic Artificial Intelligence
url https://arxiv.org/abs/2601.05724