:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Thomas, Rahul, Kitanovski, Teo, Goldblum, Micah, Pal, Arka
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2602.16994
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Knowing What You Know Is Not Enough: Large Language Model Confidences Don't Align With Their Actions
by: Pal, Arka, et al.
Published: (2025)

Greedy Multi-Path Block Verification for Faster Decoding in Speculative Sampling
by: Thomas, Rahul, et al.
Published: (2026)

Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Minimization
by: Thomas, Rahul Krishna, et al.
Published: (2025)

Cascade: Token-Sharded Private LLM Inference
by: Thomas, Rahul, et al.
Published: (2025)

An Attack to Break Permutation-Based Private Third-Party Inference Schemes for LLMs
by: Thomas, Rahul, et al.
Published: (2025)

vTune: Verifiable Fine-Tuning for LLMs Through Backdooring
by: Zhang, Eva, et al.
Published: (2024)

Privacy-Preserving Mechanisms Enable Cheap Verifiable Inference of LLMs
by: Pal, Arka, et al.
Published: (2026)

Large Language Models Must Be Taught to Know What They Don't Know
by: Kapoor, Sanyam, et al.
Published: (2024)

The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning
by: Goldblum, Micah, et al.
Published: (2023)

Speculative Speculative Decoding
by: Kumar, Tanishq, et al.
Published: (2026)

Multi-Token Prediction via Self-Distillation
by: Kirchenbauer, John, et al.
Published: (2026)

Adaptive Retention & Correction: Test-Time Training for Continual Learning
by: Chen, Haoran, et al.
Published: (2024)

Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
by: Marek, Martin, et al.
Published: (2025)

Compute Better Spent: Replacing Dense Layers with Structured Matrices
by: Qiu, Shikai, et al.
Published: (2024)

Decoding Speculative Decoding
by: Yan, Minghao, et al.
Published: (2024)

The Lie Derivative for Measuring Learned Equivariance
by: Gruver, Nate, et al.
Published: (2022)

DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure
by: Xiong, Yunfan, et al.
Published: (2024)

On Training in Imagination
by: Timor, Nadav, et al.
Published: (2026)

Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks
by: Li, Ang, et al.
Published: (2025)

Draft, Verify, and Improve: Toward Training-Aware Speculative Decoding
by: Bhansali, Shrenik, et al.
Published: (2025)

Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
by: Lotfi, Sanae, et al.
Published: (2024)

Traversal Verification for Speculative Tree Decoding
by: Weng, Yepeng, et al.
Published: (2025)

Gemstones: A Model Suite for Multi-Faceted Scaling Laws
by: McLeish, Sean, et al.
Published: (2025)

Online Speculative Decoding
by: Liu, Xiaoxuan, et al.
Published: (2023)

DistillSpec: Improving Speculative Decoding via Knowledge Distillation
by: Zhou, Yongchao, et al.
Published: (2023)

Identifying and Evaluating Inactive Heads in Pretrained LLMs
by: Sandoval-Segura, Pedro, et al.
Published: (2025)

Non-Vacuous Generalization Bounds for Large Language Models
by: Lotfi, Sanae, et al.
Published: (2023)

Speculative Decoding Across Languages
by: Paudel, Nirajan, et al.
Published: (2026)

Speculative Safety-Aware Decoding
by: Wang, Xuekang, et al.
Published: (2025)

Closing the Train-Test Gap in World Models for Gradient-Based Planning
by: Parthasarathy, Arjun, et al.
Published: (2025)

Just How Flexible are Neural Networks in Practice?
by: Shwartz-Ziv, Ravid, et al.
Published: (2024)

TapOut: A Bandit-Based Approach to Dynamic Speculative Decoding
by: Sridhar, Aditya, et al.
Published: (2025)

Fast Inference via Hierarchical Speculative Decoding
by: Mohri, Clara, et al.
Published: (2025)

Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding
by: Guan, Yue, et al.
Published: (2025)

SpecPV: Improving Self-Speculative Decoding for Long-Context Generation via Partial Verification
by: Tan, Zhendong, et al.
Published: (2025)

Mixture of Attentions For Speculative Decoding
by: Zimmer, Matthieu, et al.
Published: (2024)

Scaling Speculative Decoding with Lookahead Reasoning
by: Fu, Yichao, et al.
Published: (2025)

STree: Speculative Tree Decoding for Hybrid State-Space Models
by: Wu, Yangchao, et al.
Published: (2025)

SpecMER: Fast Protein Generation with K-mer Guided Speculative Decoding
by: Walton, Thomas, et al.
Published: (2025)

TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks
by: Feuer, Benjamin, et al.
Published: (2024)