Saved in:
| Main Authors: | Ma, Xindian, Lu, Yidi, Zhang, Peng, Zhang, Jing |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.02197 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM
by: Li, Yubo, et al.
Published: (2026)
by: Li, Yubo, et al.
Published: (2026)
MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference
by: Li, Kunxi, et al.
Published: (2025)
by: Li, Kunxi, et al.
Published: (2025)
RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction
by: Liu, Sihao, et al.
Published: (2026)
by: Liu, Sihao, et al.
Published: (2026)
LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
by: Shen, Yiqun, et al.
Published: (2025)
by: Shen, Yiqun, et al.
Published: (2025)
RDKV: Rate-Distortion Bit Allocation for Joint Eviction and Quantization of the KV Cache
by: Zhang, Junkai, et al.
Published: (2026)
by: Zhang, Junkai, et al.
Published: (2026)
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
by: Ahn, Jinwoo, et al.
Published: (2026)
by: Ahn, Jinwoo, et al.
Published: (2026)
EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving
by: Feng, Shaoting, et al.
Published: (2025)
by: Feng, Shaoting, et al.
Published: (2025)
CrossQuant: A Post-Training Quantization Method with Smaller Quantization Kernel for Precise Large Language Model Compression
by: Liu, Wenyuan, et al.
Published: (2024)
by: Liu, Wenyuan, et al.
Published: (2024)
Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression
by: Zhang, Te, et al.
Published: (2025)
by: Zhang, Te, et al.
Published: (2025)
Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective
by: Yang, Jiaming, et al.
Published: (2026)
by: Yang, Jiaming, et al.
Published: (2026)
When Does Value-Aware KV Eviction Help? A Fixed-Contract Diagnostic for Non-Monotone Cache Compression
by: Zhang, Ruijie, et al.
Published: (2026)
by: Zhang, Ruijie, et al.
Published: (2026)
MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models
by: Liu, Wenyuan, et al.
Published: (2025)
by: Liu, Wenyuan, et al.
Published: (2025)
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
by: Liu, Akide, et al.
Published: (2024)
by: Liu, Akide, et al.
Published: (2024)
ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
by: He, Yefei, et al.
Published: (2024)
by: He, Yefei, et al.
Published: (2024)
LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference
by: Wang, Guangtao, et al.
Published: (2025)
by: Wang, Guangtao, et al.
Published: (2025)
Tensor Cache: Eviction-conditioned Associative Memory for Transformers
by: Swain, Kabir, et al.
Published: (2026)
by: Swain, Kabir, et al.
Published: (2026)
CoKV: Optimizing KV Cache Allocation via Cooperative Game
by: Sun, Qiheng, et al.
Published: (2025)
by: Sun, Qiheng, et al.
Published: (2025)
SEE: Sememe Entanglement Encoding for Transformer-bases Models Compression
by: Zhang, Jing, et al.
Published: (2024)
by: Zhang, Jing, et al.
Published: (2024)
ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs
by: Meng, Haoqian, et al.
Published: (2026)
by: Meng, Haoqian, et al.
Published: (2026)
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
by: Tiwari, Rishabh, et al.
Published: (2025)
by: Tiwari, Rishabh, et al.
Published: (2025)
KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs
by: Chen, Chuangtao, et al.
Published: (2026)
by: Chen, Chuangtao, et al.
Published: (2026)
AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models
by: Gu, Yifeng, et al.
Published: (2025)
by: Gu, Yifeng, et al.
Published: (2025)
G-KV: Decoding-Time KV Cache Eviction with Global Attention
by: Liao, Mengqi, et al.
Published: (2025)
by: Liao, Mengqi, et al.
Published: (2025)
SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference
by: Zhao, Yi, et al.
Published: (2025)
by: Zhao, Yi, et al.
Published: (2025)
LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences
by: Wu, Wenbo, et al.
Published: (2025)
by: Wu, Wenbo, et al.
Published: (2025)
LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management
by: Xiong, Yi, et al.
Published: (2024)
by: Xiong, Yi, et al.
Published: (2024)
SPA-Cache: Singular Proxies for Adaptive Caching in Diffusion Language Models
by: Sun, Wenhao, et al.
Published: (2026)
by: Sun, Wenhao, et al.
Published: (2026)
HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing
by: Liu, Minghui, et al.
Published: (2024)
by: Liu, Minghui, et al.
Published: (2024)
One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache
by: Lu, Liming, et al.
Published: (2026)
by: Lu, Liming, et al.
Published: (2026)
The Pitfalls of KV Cache Compression
by: Chen, Alex, et al.
Published: (2025)
by: Chen, Alex, et al.
Published: (2025)
SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning
by: Kariyappa, Sanjay, et al.
Published: (2026)
by: Kariyappa, Sanjay, et al.
Published: (2026)
KVzap: Fast, Adaptive, and Faithful KV Cache Pruning
by: Jegou, Simon, et al.
Published: (2026)
by: Jegou, Simon, et al.
Published: (2026)
Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing
by: Filippova, Anastasiia, et al.
Published: (2026)
by: Filippova, Anastasiia, et al.
Published: (2026)
ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression
by: Liu, Guangda, et al.
Published: (2024)
by: Liu, Guangda, et al.
Published: (2024)
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
by: Yan, Xianglong, et al.
Published: (2025)
by: Yan, Xianglong, et al.
Published: (2025)
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025)
by: Liu, Guangda, et al.
Published: (2025)
Revisiting Multimodal KV Cache Compression: A Frequency-Domain-Guided Outlier-KV-Aware Approach
by: Yang, Yaoxin, et al.
Published: (2025)
by: Yang, Yaoxin, et al.
Published: (2025)
MixKVQ: Query-Aware Mixed-Precision KV Cache Quantization for Long-Context Reasoning
by: Zhang, Tao, et al.
Published: (2025)
by: Zhang, Tao, et al.
Published: (2025)
CacheClip: Accelerating RAG with Effective KV Cache Reuse
by: Yang, Bin, et al.
Published: (2025)
by: Yang, Bin, et al.
Published: (2025)
IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs
by: Mao, Yuzhen, et al.
Published: (2026)
by: Mao, Yuzhen, et al.
Published: (2026)
Similar Items
-
CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM
by: Li, Yubo, et al.
Published: (2026) -
MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference
by: Li, Kunxi, et al.
Published: (2025) -
RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction
by: Liu, Sihao, et al.
Published: (2026) -
LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
by: Shen, Yiqun, et al.
Published: (2025) -
RDKV: Rate-Distortion Bit Allocation for Joint Eviction and Quantization of the KV Cache
by: Zhang, Junkai, et al.
Published: (2026)