:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ma, Xindian, Lu, Yidi, Zhang, Peng, Zhang, Jing
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.02197
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM
by: Li, Yubo, et al.
Published: (2026)

MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference
by: Li, Kunxi, et al.
Published: (2025)

RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction
by: Liu, Sihao, et al.
Published: (2026)

LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
by: Shen, Yiqun, et al.
Published: (2025)

RDKV: Rate-Distortion Bit Allocation for Joint Eviction and Quantization of the KV Cache
by: Zhang, Junkai, et al.
Published: (2026)

LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
by: Ahn, Jinwoo, et al.
Published: (2026)

EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving
by: Feng, Shaoting, et al.
Published: (2025)

CrossQuant: A Post-Training Quantization Method with Smaller Quantization Kernel for Precise Large Language Model Compression
by: Liu, Wenyuan, et al.
Published: (2024)

Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression
by: Zhang, Te, et al.
Published: (2025)

Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective
by: Yang, Jiaming, et al.
Published: (2026)

When Does Value-Aware KV Eviction Help? A Fixed-Contract Diagnostic for Non-Monotone Cache Compression
by: Zhang, Ruijie, et al.
Published: (2026)

MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models
by: Liu, Wenyuan, et al.
Published: (2025)

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
by: Liu, Akide, et al.
Published: (2024)

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
by: He, Yefei, et al.
Published: (2024)

LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference
by: Wang, Guangtao, et al.
Published: (2025)

Tensor Cache: Eviction-conditioned Associative Memory for Transformers
by: Swain, Kabir, et al.
Published: (2026)

CoKV: Optimizing KV Cache Allocation via Cooperative Game
by: Sun, Qiheng, et al.
Published: (2025)

SEE: Sememe Entanglement Encoding for Transformer-bases Models Compression
by: Zhang, Jing, et al.
Published: (2024)

ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs
by: Meng, Haoqian, et al.
Published: (2026)

QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
by: Tiwari, Rishabh, et al.
Published: (2025)

KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs
by: Chen, Chuangtao, et al.
Published: (2026)

AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models
by: Gu, Yifeng, et al.
Published: (2025)

G-KV: Decoding-Time KV Cache Eviction with Global Attention
by: Liao, Mengqi, et al.
Published: (2025)

SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference
by: Zhao, Yi, et al.
Published: (2025)

LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences
by: Wu, Wenbo, et al.
Published: (2025)

LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management
by: Xiong, Yi, et al.
Published: (2024)

SPA-Cache: Singular Proxies for Adaptive Caching in Diffusion Language Models
by: Sun, Wenhao, et al.
Published: (2026)

HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing
by: Liu, Minghui, et al.
Published: (2024)

One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache
by: Lu, Liming, et al.
Published: (2026)

The Pitfalls of KV Cache Compression
by: Chen, Alex, et al.
Published: (2025)

SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning
by: Kariyappa, Sanjay, et al.
Published: (2026)

KVzap: Fast, Adaptive, and Faithful KV Cache Pruning
by: Jegou, Simon, et al.
Published: (2026)

Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing
by: Filippova, Anastasiia, et al.
Published: (2026)

ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression
by: Liu, Guangda, et al.
Published: (2024)

ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
by: Yan, Xianglong, et al.
Published: (2025)

FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025)

Revisiting Multimodal KV Cache Compression: A Frequency-Domain-Guided Outlier-KV-Aware Approach
by: Yang, Yaoxin, et al.
Published: (2025)

MixKVQ: Query-Aware Mixed-Precision KV Cache Quantization for Long-Context Reasoning
by: Zhang, Tao, et al.
Published: (2025)

CacheClip: Accelerating RAG with Effective KV Cache Reuse
by: Yang, Bin, et al.
Published: (2025)

IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs
by: Mao, Yuzhen, et al.
Published: (2026)