:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hoang, Duc, Jaiswal, Ajay, Samragh, Mohammad, Cho, Minsik
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.03921
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SPD: Sync-Point Drop for Efficient Tensor Parallelism of Large Language Models
by: Kim, Han-Byul, et al.
Published: (2025)

TIDE: Every Layer Knows the Token Beneath the Context
by: Jaiswal, Ajay, et al.
Published: (2026)

MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE
by: Zibakhsh, Soheil, et al.
Published: (2025)

Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why
by: Armandpour, Mohammadreza, et al.
Published: (2026)

MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
by: Hannah, Lauren. A, et al.
Published: (2025)

SpecMoE: A Fast and Efficient Mixture-of-Experts Inference via Self-Assisted Speculative Decoding
by: Bang, Jehyeon, et al.
Published: (2026)

Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization
by: Samragh, Mohammad, et al.
Published: (2024)

Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations
by: Jaiswal, Ajay, et al.
Published: (2025)

SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences
by: Cha, Jungyoub, et al.
Published: (2025)

HiSpec: Hierarchical Speculative Decoding for LLMs
by: Kumar, Avinash, et al.
Published: (2025)

BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms
by: Hou, Yunlong, et al.
Published: (2025)

QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
by: Tiwari, Rishabh, et al.
Published: (2025)

MemoryLLM: Plug-n-Play Interpretable Feed-Forward Memory for Transformers
by: Jaiswal, Ajay, et al.
Published: (2026)

SpecMemo: Speculative Decoding is in Your Pocket
by: Yildirim, Selin, et al.
Published: (2025)

Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance
by: Wang, Songsheng, et al.
Published: (2025)

SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
by: Pan, Rui, et al.
Published: (2025)

Speculating Experts Accelerates Inference for Mixture-of-Experts
by: Madan, Vivan, et al.
Published: (2026)

DistillSpec: Improving Speculative Decoding via Knowledge Distillation
by: Zhou, Yongchao, et al.
Published: (2023)

Towards Low-bit Communication for Tensor Parallel LLM Inference
by: Dong, Harry, et al.
Published: (2024)

ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts
by: Georganas, Evangelos, et al.
Published: (2025)

SpecExit: Accelerating Large Reasoning Model via Speculative Exit
by: Yang, Rubing, et al.
Published: (2025)

SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
by: Huang, Kaixuan, et al.
Published: (2024)

BubbleSpec: Turning Long-Tail Bubbles into Speculative Rollout Drafts for Synchronous Reinforcement Learning
by: Xu, Yuhang, et al.
Published: (2026)

KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem
by: Cha, Seongjin, et al.
Published: (2026)

CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs
by: Ning, Zhiyuan, et al.
Published: (2025)

SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding
by: Li, Shenggui, et al.
Published: (2026)

SpecBound: Adaptive Bounded Self-Speculation with Layer-wise Confidence Calibration
by: Wen, Zhuofan, et al.
Published: (2026)

LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
by: Yang, Penghui, et al.
Published: (2025)

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
by: Fu, Qichen, et al.
Published: (2024)

Recursive Language Models Meet Uncertainty: The Surprising Effectiveness of Self-Reflective Program Search for Long Context
by: Alizadeh, Keivan, et al.
Published: (2026)

DynaSpec: Context-aware Dynamic Speculative Sampling for Large-Vocabulary Language Models
by: Zhang, Jinbin, et al.
Published: (2025)

Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications
by: Hoang, Duc N. M, et al.
Published: (2023)

FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling
by: Zhao, Weilin, et al.
Published: (2025)

Barriers for Learning in an Evolving World: Mathematical Understanding of Loss of Plasticity
by: Joudaki, Amir, et al.
Published: (2025)

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
by: Yin, Lu, et al.
Published: (2023)

LLaGA: Large Language and Graph Assistant
by: Chen, Runjin, et al.
Published: (2024)

Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential
by: Samragh, Mohammad, et al.
Published: (2025)

Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching
by: Dong, Yanhao, et al.
Published: (2025)

Explainable AI in Time-Sensitive Scenarios: Prefetched Offline Explanation Model
by: Russo, Fabio Michele, et al.
Published: (2025)

SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection
by: Shukla, Shikhar
Published: (2026)