Guardado en:
| Autores principales: | Agrawal, Sudhanshu, Jeon, Wonseok, Lee, Mingu |
|---|---|
| Formato: | Preprint |
| Publicado: |
2024
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2410.18351 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs
por: Goel, Raghavv, et al.
Publicado: (2024)
por: Goel, Raghavv, et al.
Publicado: (2024)
On Speculative Decoding for Multimodal Large Language Models
por: Gagrani, Mukul, et al.
Publicado: (2024)
por: Gagrani, Mukul, et al.
Publicado: (2024)
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
por: Agrawal, Sudhanshu, et al.
Publicado: (2025)
por: Agrawal, Sudhanshu, et al.
Publicado: (2025)
VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs
por: Goel, Raghavv, et al.
Publicado: (2025)
por: Goel, Raghavv, et al.
Publicado: (2025)
AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of Adaptive Draft Structures
por: Zhang, Situo, et al.
Publicado: (2024)
por: Zhang, Situo, et al.
Publicado: (2024)
Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement
por: Jeon, Wonseok, et al.
Publicado: (2024)
por: Jeon, Wonseok, et al.
Publicado: (2024)
Training-free Dropout Sampling for Semantic Token Acceptance in Speculative Decoding
por: Lee, Jeongtae, et al.
Publicado: (2026)
por: Lee, Jeongtae, et al.
Publicado: (2026)
TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding
por: Wu, Zhaoxuan, et al.
Publicado: (2025)
por: Wu, Zhaoxuan, et al.
Publicado: (2025)
Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding
por: Cho, Sukmin, et al.
Publicado: (2025)
por: Cho, Sukmin, et al.
Publicado: (2025)
Model-free Speculative Decoding for Transformer-based ASR with Token Map Drafting
por: Ho, Tuan Vu, et al.
Publicado: (2025)
por: Ho, Tuan Vu, et al.
Publicado: (2025)
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding
por: Zhang, Jun, et al.
Publicado: (2023)
por: Zhang, Jun, et al.
Publicado: (2023)
Draft Model Knows When to Stop: Self-Verification Speculative Decoding for Long-Form Generation
por: Zhang, Ziyin, et al.
Publicado: (2024)
por: Zhang, Ziyin, et al.
Publicado: (2024)
A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs
por: Goel, Raghavv, et al.
Publicado: (2026)
por: Goel, Raghavv, et al.
Publicado: (2026)
AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference
por: Lu, Kuan-Wei, et al.
Publicado: (2025)
por: Lu, Kuan-Wei, et al.
Publicado: (2025)
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
por: Liu, Tianyu, et al.
Publicado: (2024)
por: Liu, Tianyu, et al.
Publicado: (2024)
Cost-Aware Diffusion Draft Trees for Speculative Decoding
por: Zhang, Shuai, et al.
Publicado: (2026)
por: Zhang, Shuai, et al.
Publicado: (2026)
Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning
por: Zhang, Jiebin, et al.
Publicado: (2026)
por: Zhang, Jiebin, et al.
Publicado: (2026)
Accelerating Speculative Decoding with Block Diffusion Draft Trees
por: Ringel, Liran, et al.
Publicado: (2026)
por: Ringel, Liran, et al.
Publicado: (2026)
AdaSpec: Adaptive Speculative Decoding for Fast, SLO-Aware Large Language Model Serving
por: Huang, Kaiyu, et al.
Publicado: (2025)
por: Huang, Kaiyu, et al.
Publicado: (2025)
DREAM: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding
por: Hu, Yunhai, et al.
Publicado: (2025)
por: Hu, Yunhai, et al.
Publicado: (2025)
Speculative Decoding Speed-of-Light: Optimal Lower Bounds via Branching Random Walks
por: Pankratov, Sergey, et al.
Publicado: (2025)
por: Pankratov, Sergey, et al.
Publicado: (2025)
Flatter Tokens are More Valuable for Speculative Draft Model Training
por: Fan, Jiaming, et al.
Publicado: (2026)
por: Fan, Jiaming, et al.
Publicado: (2026)
Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration
por: Wen, Zhuofan, et al.
Publicado: (2024)
por: Wen, Zhuofan, et al.
Publicado: (2024)
OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
por: Wang, Jikai, et al.
Publicado: (2024)
por: Wang, Jikai, et al.
Publicado: (2024)
TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs
por: Lee, Minjae, et al.
Publicado: (2026)
por: Lee, Minjae, et al.
Publicado: (2026)
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
por: Liu, Tianyu, et al.
Publicado: (2025)
por: Liu, Tianyu, et al.
Publicado: (2025)
Acceptance Dynamics Across Cognitive Domains in Speculative Decoding
por: Mahmoud, Saif
Publicado: (2026)
por: Mahmoud, Saif
Publicado: (2026)
SpecHub: Provable Acceleration to Multi-Draft Speculative Decoding
por: Sun, Ryan, et al.
Publicado: (2024)
por: Sun, Ryan, et al.
Publicado: (2024)
FlexDraft: Flexible Speculative Decoding via Attention Tuning and Bonus-Guided Calibration
por: Zhang, Yaojie, et al.
Publicado: (2026)
por: Zhang, Yaojie, et al.
Publicado: (2026)
Make Every Draft Count: Hidden State based Speculative Decoding
por: Chen, Yuetao, et al.
Publicado: (2026)
por: Chen, Yuetao, et al.
Publicado: (2026)
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting
por: Lv, Kai, et al.
Publicado: (2025)
por: Lv, Kai, et al.
Publicado: (2025)
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding
por: Zhao, Weilin, et al.
Publicado: (2024)
por: Zhao, Weilin, et al.
Publicado: (2024)
SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting
por: Shi, Weijie, et al.
Publicado: (2026)
por: Shi, Weijie, et al.
Publicado: (2026)
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
por: Huang, Jianuo, et al.
Publicado: (2026)
por: Huang, Jianuo, et al.
Publicado: (2026)
Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine Similarity
por: Metel, Michael R., et al.
Publicado: (2024)
por: Metel, Michael R., et al.
Publicado: (2024)
DFlare: Scaling Up Draft Capacity for Block Diffusion Speculative Decoding
por: Zhang, Jiebin, et al.
Publicado: (2026)
por: Zhang, Jiebin, et al.
Publicado: (2026)
POSS: Position Specialist Generates Better Draft for Speculative Decoding
por: Huang, Langlin, et al.
Publicado: (2025)
por: Huang, Langlin, et al.
Publicado: (2025)
LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding
por: Samarin, Alexander, et al.
Publicado: (2026)
por: Samarin, Alexander, et al.
Publicado: (2026)
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders
por: Hu, Yuezhou, et al.
Publicado: (2025)
por: Hu, Yuezhou, et al.
Publicado: (2025)
Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
por: Hu, Shijing, et al.
Publicado: (2025)
por: Hu, Shijing, et al.
Publicado: (2025)
Ejemplares similares
-
Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs
por: Goel, Raghavv, et al.
Publicado: (2024) -
On Speculative Decoding for Multimodal Large Language Models
por: Gagrani, Mukul, et al.
Publicado: (2024) -
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
por: Agrawal, Sudhanshu, et al.
Publicado: (2025) -
VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs
por: Goel, Raghavv, et al.
Publicado: (2025) -
AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of Adaptive Draft Structures
por: Zhang, Situo, et al.
Publicado: (2024)