Guardado en:
| Autores principales: | Gao, Xiangxiang, Xie, Weisheng, Xiang, Yiwei, Ji, Feng |
|---|---|
| Formato: | Preprint |
| Publicado: |
2024
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2412.12639 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
por: Li, Yuhui, et al.
Publicado: (2024)
por: Li, Yuhui, et al.
Publicado: (2024)
FlashDecoding++: Faster Large Language Model Inference on GPUs
por: Hong, Ke, et al.
Publicado: (2023)
por: Hong, Ke, et al.
Publicado: (2023)
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
por: Huang, Jianuo, et al.
Publicado: (2026)
por: Huang, Jianuo, et al.
Publicado: (2026)
Cascade Speculative Drafting for Even Faster LLM Inference
por: Chen, Ziyi, et al.
Publicado: (2023)
por: Chen, Ziyi, et al.
Publicado: (2023)
AR-MAP: Are Autoregressive Large Language Models Implicit Teachers for Diffusion Large Language Models?
por: Lin, Liang, et al.
Publicado: (2026)
por: Lin, Liang, et al.
Publicado: (2026)
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
por: Li, Jia-Nan, et al.
Publicado: (2025)
por: Li, Jia-Nan, et al.
Publicado: (2025)
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding
por: Zhao, Weilin, et al.
Publicado: (2024)
por: Zhao, Weilin, et al.
Publicado: (2024)
Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration
por: Wen, Zhuofan, et al.
Publicado: (2024)
por: Wen, Zhuofan, et al.
Publicado: (2024)
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
por: Liu, Tianyu, et al.
Publicado: (2024)
por: Liu, Tianyu, et al.
Publicado: (2024)
Exploring and Improving Drafts in Blockwise Parallel Decoding
por: Kim, Taehyeon, et al.
Publicado: (2024)
por: Kim, Taehyeon, et al.
Publicado: (2024)
Blockwise SFT for Diffusion Language Models: Reconciling Bidirectional Attention and Autoregressive Decoding
por: Sun, Bowen, et al.
Publicado: (2025)
por: Sun, Bowen, et al.
Publicado: (2025)
Adaptive Draft-Verification for Efficient Large Language Model Decoding
por: Liu, Xukun, et al.
Publicado: (2024)
por: Liu, Xukun, et al.
Publicado: (2024)
Chain of Draft: Thinking Faster by Writing Less
por: Xu, Silei, et al.
Publicado: (2025)
por: Xu, Silei, et al.
Publicado: (2025)
Balancing Coverage and Draft Latency in Vocabulary Trimming for Faster Speculative Decoding
por: Shoham, Ofir Ben
Publicado: (2026)
por: Shoham, Ofir Ben
Publicado: (2026)
Why Diffusion Language Models Struggle with Truly Parallel (Non-Autoregressive) Decoding?
por: Li, Pengxiang, et al.
Publicado: (2026)
por: Li, Pengxiang, et al.
Publicado: (2026)
Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding
por: Yi, Hanling, et al.
Publicado: (2024)
por: Yi, Hanling, et al.
Publicado: (2024)
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding
por: Zhang, Jun, et al.
Publicado: (2023)
por: Zhang, Jun, et al.
Publicado: (2023)
PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding
por: An, Zihao, et al.
Publicado: (2026)
por: An, Zihao, et al.
Publicado: (2026)
Breaking Block Boundaries: Anchor-based History-stable Decoding for Diffusion Large Language Models
por: Zou, Shun, et al.
Publicado: (2026)
por: Zou, Shun, et al.
Publicado: (2026)
CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credit
por: Wang, Kangyu, et al.
Publicado: (2025)
por: Wang, Kangyu, et al.
Publicado: (2025)
MineDraft: A Framework for Batch Parallel Speculative Decoding
por: Tang, Zhenwei, et al.
Publicado: (2026)
por: Tang, Zhenwei, et al.
Publicado: (2026)
Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding
por: Cho, Sukmin, et al.
Publicado: (2025)
por: Cho, Sukmin, et al.
Publicado: (2025)
$\texttt{SPECS}$: Faster Test-Time Scaling through Speculative Drafts
por: Cemri, Mert, et al.
Publicado: (2025)
por: Cemri, Mert, et al.
Publicado: (2025)
Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference
por: Timor, Nadav, et al.
Publicado: (2024)
por: Timor, Nadav, et al.
Publicado: (2024)
Enhancing Chemical Reaction and Retrosynthesis Prediction with Large Language Model and Dual-task Learning
por: Lin, Xuan, et al.
Publicado: (2025)
por: Lin, Xuan, et al.
Publicado: (2025)
Customizing Language Model Responses with Contrastive In-Context Learning
por: Gao, Xiang, et al.
Publicado: (2024)
por: Gao, Xiang, et al.
Publicado: (2024)
Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding
por: Bao, Wenrui, et al.
Publicado: (2025)
por: Bao, Wenrui, et al.
Publicado: (2025)
Enhancing Molecular Property Prediction with Knowledge from Large Language Models
por: Zhou, Peng, et al.
Publicado: (2025)
por: Zhou, Peng, et al.
Publicado: (2025)
Speculative Decoding for Multi-Sample Inference
por: Li, Yiwei, et al.
Publicado: (2025)
por: Li, Yiwei, et al.
Publicado: (2025)
Cost-Aware Diffusion Draft Trees for Speculative Decoding
por: Zhang, Shuai, et al.
Publicado: (2026)
por: Zhang, Shuai, et al.
Publicado: (2026)
Accelerating Speculative Decoding with Block Diffusion Draft Trees
por: Ringel, Liran, et al.
Publicado: (2026)
por: Ringel, Liran, et al.
Publicado: (2026)
Reviving Any-Subset Autoregressive Models with Principled Parallel Sampling and Speculative Decoding
por: Guo, Gabe, et al.
Publicado: (2025)
por: Guo, Gabe, et al.
Publicado: (2025)
OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
por: Wang, Jikai, et al.
Publicado: (2024)
por: Wang, Jikai, et al.
Publicado: (2024)
Self Speculative Decoding for Diffusion Large Language Models
por: Gao, Yifeng, et al.
Publicado: (2025)
por: Gao, Yifeng, et al.
Publicado: (2025)
Plato: Plan to Efficiently Decode for Large Language Model Inference
por: Jin, Shuowei, et al.
Publicado: (2024)
por: Jin, Shuowei, et al.
Publicado: (2024)
RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval
por: Nguyen, Long, et al.
Publicado: (2025)
por: Nguyen, Long, et al.
Publicado: (2025)
Enhancing Contextual Understanding in Large Language Models through Contrastive Decoding
por: Zhao, Zheng, et al.
Publicado: (2024)
por: Zhao, Zheng, et al.
Publicado: (2024)
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
por: Welleck, Sean, et al.
Publicado: (2024)
por: Welleck, Sean, et al.
Publicado: (2024)
AuroraEdge-V-2B: A Faster And Stronger Edge Visual Large Language Model
por: Chen, Xiang
Publicado: (2026)
por: Chen, Xiang
Publicado: (2026)
SciCustom: A Framework for Custom Evaluation of Scientific Capabilities in Large Language Models
por: Gu, Yiyang, et al.
Publicado: (2026)
por: Gu, Yiyang, et al.
Publicado: (2026)
Ejemplares similares
-
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
por: Li, Yuhui, et al.
Publicado: (2024) -
FlashDecoding++: Faster Large Language Model Inference on GPUs
por: Hong, Ke, et al.
Publicado: (2023) -
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
por: Huang, Jianuo, et al.
Publicado: (2026) -
Cascade Speculative Drafting for Even Faster LLM Inference
por: Chen, Ziyi, et al.
Publicado: (2023) -
AR-MAP: Are Autoregressive Large Language Models Implicit Teachers for Diffusion Large Language Models?
por: Lin, Liang, et al.
Publicado: (2026)