:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wang, Xuliang, Chen, Yuetao, Zhen, Maochan, Liu, Fang, Zheng, Xinzhou, Liu, Xingwu, Xu, Hong, Li, Ming
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Artificial Intelligence Computation and Language Machine Learning
Online-Zugang:	https://arxiv.org/abs/2602.01762
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Make Every Draft Count: Hidden State based Speculative Decoding
von: Chen, Yuetao, et al.
Veröffentlicht: (2026)

Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration
von: Wen, Zhuofan, et al.
Veröffentlicht: (2024)

Training Domain Draft Models for Speculative Decoding: Best Practices and Insights
von: Hong, Fenglu, et al.
Veröffentlicht: (2025)

$\texttt{SPECS}$: Faster Test-Time Scaling through Speculative Drafts
von: Cemri, Mert, et al.
Veröffentlicht: (2025)

Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs
von: Goel, Raghavv, et al.
Veröffentlicht: (2024)

POSS: Position Specialist Generates Better Draft for Speculative Decoding
von: Huang, Langlin, et al.
Veröffentlicht: (2025)

ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts
von: Georganas, Evangelos, et al.
Veröffentlicht: (2025)

Balancing Coverage and Draft Latency in Vocabulary Trimming for Faster Speculative Decoding
von: Shoham, Ofir Ben
Veröffentlicht: (2026)

LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
von: Yang, Penghui, et al.
Veröffentlicht: (2025)

PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding
von: An, Zihao, et al.
Veröffentlicht: (2026)

MineDraft: A Framework for Batch Parallel Speculative Decoding
von: Tang, Zhenwei, et al.
Veröffentlicht: (2026)

Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
von: Wang, Zilong, et al.
Veröffentlicht: (2024)

FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling
von: Zhao, Weilin, et al.
Veröffentlicht: (2025)

Speculative Streaming: Fast LLM Inference without Auxiliary Models
von: Bhendawade, Nikhil, et al.
Veröffentlicht: (2024)

Clover-2: Accurate Inference for Regressive Lightweight Speculative Decoding
von: Xiao, Bin, et al.
Veröffentlicht: (2024)

SpecHub: Provable Acceleration to Multi-Draft Speculative Decoding
von: Sun, Ryan, et al.
Veröffentlicht: (2024)

Draft Model Knows When to Stop: Self-Verification Speculative Decoding for Long-Form Generation
von: Zhang, Ziyin, et al.
Veröffentlicht: (2024)

TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
von: Qiu, Jiahao, et al.
Veröffentlicht: (2024)

Speculative Decoding for Multi-Sample Inference
von: Li, Yiwei, et al.
Veröffentlicht: (2025)

A Theoretical Perspective for Speculative Decoding Algorithm
von: Yin, Ming, et al.
Veröffentlicht: (2024)

Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference
von: Timor, Nadav, et al.
Veröffentlicht: (2024)

Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency
von: Li, Ruixiao, et al.
Veröffentlicht: (2025)

Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling
von: Zheng, Kaiwen, et al.
Veröffentlicht: (2024)

AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of Adaptive Draft Structures
von: Zhang, Situo, et al.
Veröffentlicht: (2024)

SpecExit: Accelerating Large Reasoning Model via Speculative Exit
von: Yang, Rubing, et al.
Veröffentlicht: (2025)

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
von: Elhoushi, Mostafa, et al.
Veröffentlicht: (2024)

TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding
von: Wu, Zhaoxuan, et al.
Veröffentlicht: (2025)

Fast Large Language Model Collaborative Decoding via Speculation
von: Fu, Jiale, et al.
Veröffentlicht: (2025)

Online Speculative Decoding
von: Liu, Xiaoxuan, et al.
Veröffentlicht: (2023)

DynaSpec: Context-aware Dynamic Speculative Sampling for Large-Vocabulary Language Models
von: Zhang, Jinbin, et al.
Veröffentlicht: (2025)

Traversal Verification for Speculative Tree Decoding
von: Weng, Yepeng, et al.
Veröffentlicht: (2025)

Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding
von: Ryu, Hyun, et al.
Veröffentlicht: (2024)

Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies
von: Timor, Nadav, et al.
Veröffentlicht: (2025)

SPEC-RL: Accelerating On-Policy Reinforcement Learning with Speculative Rollouts
von: Liu, Bingshuai, et al.
Veröffentlicht: (2025)

Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
von: Hu, Shijing, et al.
Veröffentlicht: (2025)

Draft Less, Retrieve More: Hybrid Tree Construction for Speculative Decoding
von: Shen, Yuhao, et al.
Veröffentlicht: (2026)

Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation
von: Liu, Yuhan, et al.
Veröffentlicht: (2025)

Language Models "Grok" to Copy
von: Lv, Ang, et al.
Veröffentlicht: (2024)

S$^4$C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models
von: He, Tao, et al.
Veröffentlicht: (2025)

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective
von: Zhong, Ming, et al.
Veröffentlicht: (2023)