:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zweiger, Adam, Fu, Xinghong, Guo, Han, Kim, Yoon
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2602.16284
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Self-Adapting Language Models
by: Zweiger, Adam, et al.
Published: (2025)

The Surprising Effectiveness of Test-Time Training for Few-Shot Learning
by: Akyürek, Ekin, et al.
Published: (2024)

Reverso: Efficient Time Series Foundation Models for Zero-shot Forecasting
by: Fu, Xinghong, et al.
Published: (2026)

Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction
by: Kim, Jang-Hyun, et al.
Published: (2026)

FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration
by: Jo, Dongwon, et al.
Published: (2025)

Financial Fine-tuning a Large Time Series Model
by: Fu, Xinghong, et al.
Published: (2024)

Do Two AI Scientists Agree?
by: Fu, Xinghong, et al.
Published: (2025)

Log-Linear Attention
by: Guo, Han, et al.
Published: (2025)

LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
by: Ahn, Jinwoo, et al.
Published: (2026)

DiffKV: Differentiated Memory Management for Large Language Models with Parallel KV Compaction
by: Zhang, Yanqi, et al.
Published: (2024)

DIDI: Diffusion-Guided Diversity for Offline Behavioral Generation
by: Liu, Jinxin, et al.
Published: (2024)

In-context KV-Cache Eviction for LLMs via Attention-Gate
by: Zeng, Zihao, et al.
Published: (2024)

Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
by: Kim, Minsu, et al.
Published: (2025)

RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
by: Tang, Hanlin, et al.
Published: (2024)

RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
by: Behnam, Payman, et al.
Published: (2025)

KV Cache Transform Coding for Compact Storage in LLM Inference
by: Staniszewski, Konrad, et al.
Published: (2025)

ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs
by: Qi, Yanlin, et al.
Published: (2026)

Warm-Start Flow Matching for Guaranteed Fast Text/Image Generation
by: Kim, Minyoung
Published: (2026)

SALS: Sparse Attention in Latent Space for KV cache Compression
by: Mu, Junlin, et al.
Published: (2025)

LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning
by: Zhang, Haoyue, et al.
Published: (2025)

SplitZip: Ultra Fast Lossless KV Compression for Disaggregated LLM Serving
by: Guo, Yipin, et al.
Published: (2026)

Fast Ensembling with Diffusion Schrödinger Bridge
by: Kim, Hyunsu, et al.
Published: (2024)

EntmaxKV: Support-Aware Decoding for Entmax Attention
by: Duarte, Gonçalo, et al.
Published: (2026)

Sparse Attention across Multiple-context KV Cache
by: Cao, Ziyi, et al.
Published: (2025)

Sparse Attention as Compact Kernel Regression
by: Santos, Saul, et al.
Published: (2026)

KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity
by: Lesens, Damien, et al.
Published: (2025)

HCAttention: Extreme KV Cache Compression via Heterogeneous Attention Computing for LLMs
by: Yang, Dongquan, et al.
Published: (2025)

Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
by: Saxena, Utkarsh, et al.
Published: (2024)

Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters
by: Guo, Zhiyu, et al.
Published: (2024)

MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference
by: Rhee, Myunghyun, et al.
Published: (2025)

Contrastive Distribution Matching for Amortized Sequential Monte Carlo in Discrete Diffusion
by: Kim, Jaihoon, et al.
Published: (2026)

Fast Matrix Multiplications for Lookup Table-Quantized LLMs
by: Guo, Han, et al.
Published: (2024)

AttentionPredictor: Temporal Patterns Matter for KV Cache Compression
by: Yang, Qingyue, et al.
Published: (2025)

ScoutAttention: Efficient KV Cache Offloading via Layer-Ahead CPU Pre-computation for LLM Inference
by: Zhang, Qiuyang, et al.
Published: (2026)

Thin Keys, Full Values: Reducing KV Cache via Low-Dimensional Attention Selection
by: Yao, Hengshuai, et al.
Published: (2026)

CAOTE: KV Cache Selection for LLMs via Attention Output Error-Based Token Eviction
by: Goel, Raghavv, et al.
Published: (2025)

CoKV: Optimizing KV Cache Allocation via Cooperative Game
by: Sun, Qiheng, et al.
Published: (2025)

KVzap: Fast, Adaptive, and Faithful KV Cache Pruning
by: Jegou, Simon, et al.
Published: (2026)

Augmenting Attention with Exponentially Decaying Memory Improves Query-Aware KV Sparsity
by: Wei, Xiuying, et al.
Published: (2026)

MAC-Attention: a Match-Amend-Complete Scheme for Fast and Accurate Attention Computation
by: Yao, Jinghan, et al.
Published: (2026)