Guardado en:
| Autores principales: | , , , , , , , |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2601.05724 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| _version_ | 1866915826856099840 |
|---|---|
| author | Zhou, Yuxuan Huang, Fei Li, Heng Wu, Fengyi Wang, Tianyu Zhang, Jianwei Lin, Junyang Cheng, Zhi-Qi |
| author_facet | Zhou, Yuxuan Huang, Fei Li, Heng Wu, Fengyi Wang, Tianyu Zhang, Jianwei Lin, Junyang Cheng, Zhi-Qi |
| contents | Verification is a key bottleneck in improving inference speed while maintaining distribution fidelity in Speculative Decoding. Recent work has shown that sequence-level verification leads to a higher number of accepted tokens compared to token-wise verification. However, existing solutions often rely on surrogate approximations or are constrained by partial information, struggling with joint intractability. In this work, we propose Hierarchical Speculative Decoding (HSD), a provably lossless verification method that significantly boosts the expected number of accepted tokens and overcomes joint intractability by balancing excess and deficient probability mass across accessible branches. Our extensive large-scale experiments demonstrate that HSD yields consistent improvements in acceptance rates across diverse model families and benchmarks. Moreover, its strong explainability and generality make it readily integrable into a wide range of speculative decoding frameworks. Notably, integrating HSD into EAGLE-3 yields over a 12% performance gain, establishing state-of-the-art decoding efficiency without compromising distribution fidelity. Code is available at https://github.com/ZhouYuxuanYX/Hierarchical-Speculative-Decoding. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2601_05724 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding Zhou, Yuxuan Huang, Fei Li, Heng Wu, Fengyi Wang, Tianyu Zhang, Jianwei Lin, Junyang Cheng, Zhi-Qi Artificial Intelligence Verification is a key bottleneck in improving inference speed while maintaining distribution fidelity in Speculative Decoding. Recent work has shown that sequence-level verification leads to a higher number of accepted tokens compared to token-wise verification. However, existing solutions often rely on surrogate approximations or are constrained by partial information, struggling with joint intractability. In this work, we propose Hierarchical Speculative Decoding (HSD), a provably lossless verification method that significantly boosts the expected number of accepted tokens and overcomes joint intractability by balancing excess and deficient probability mass across accessible branches. Our extensive large-scale experiments demonstrate that HSD yields consistent improvements in acceptance rates across diverse model families and benchmarks. Moreover, its strong explainability and generality make it readily integrable into a wide range of speculative decoding frameworks. Notably, integrating HSD into EAGLE-3 yields over a 12% performance gain, establishing state-of-the-art decoding efficiency without compromising distribution fidelity. Code is available at https://github.com/ZhouYuxuanYX/Hierarchical-Speculative-Decoding. |
| title | Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding |
| topic | Artificial Intelligence |
| url | https://arxiv.org/abs/2601.05724 |