:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Yingxin, Li, Ye, Meng, Yuan, Ma, Xinzhu, Geng, Zihan, Xia, Shutao, Wang, Zhi
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2412.08521
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing
by: Liu, Minghui, et al.
Published: (2024)

Block-wise Adaptive Caching for Accelerating Diffusion Policy
by: Ji, Kangye, et al.
Published: (2025)

On the Limits of Learned Importance Scoring for KV Cache Compression
by: Steele, Brady
Published: (2026)

Learning to Evict from Key-Value Cache
by: Moschella, Luca, et al.
Published: (2026)

KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head
by: Rehg, Isaac
Published: (2024)

Which Heads Matter for Reasoning? RL-Guided KV Cache Compression
by: Du, Wenjie, et al.
Published: (2025)

SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging
by: Wu, Shunlong, et al.
Published: (2026)

CaliDrop: KV Cache Compression with Calibration
by: Su, Yi, et al.
Published: (2025)

KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference
by: Tian, Yuxuan, et al.
Published: (2025)

Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression
by: Zhang, Te, et al.
Published: (2025)

ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
by: Yan, Xianglong, et al.
Published: (2025)

SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration
by: Li, Ye, et al.
Published: (2025)

DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs
by: Zhou, Xiabin, et al.
Published: (2024)

RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
by: Tang, Hanlin, et al.
Published: (2024)

Head-Aware KV Cache Compression for Efficient Visual Autoregressive Modeling
by: Qin, Ziran, et al.
Published: (2025)

ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
by: Ramachandran, Akshat, et al.
Published: (2025)

Effectively Compress KV Heads for LLM
by: Yu, Hao, et al.
Published: (2024)

The Pitfalls of KV Cache Compression
by: Chen, Alex, et al.
Published: (2025)

KVReviver: Reversible KV Cache Compression with Sketch-Based Token Reconstruction
by: Yuan, Aomufei, et al.
Published: (2025)

CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
by: Wang, Yixuan, et al.
Published: (2025)

LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
by: Shen, Yiqun, et al.
Published: (2025)

Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks
by: Wang, Zheng, et al.
Published: (2024)

Graph-Guided Adaptive Channel Elimination for KV Cache Compression
by: Tong, Enwei, et al.
Published: (2026)

Adaptive KV-Cache Compression without Manually Setting Budget
by: Tang, Chenxia, et al.
Published: (2025)

R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
by: Cai, Zefan, et al.
Published: (2025)

LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction
by: Zhou, Enshuai, et al.
Published: (2026)

AttentionPredictor: Temporal Patterns Matter for KV Cache Compression
by: Yang, Qingyue, et al.
Published: (2025)

KV-CoRE: Benchmarking Data-Dependent Low-Rank Compressibility of KV-Caches in LLMs
by: Chen, Jian, et al.
Published: (2026)

Pyramid Cache: Layer-Adaptive KV Cache Compression with Signature-Based Cold Storage
by: Sergio dj
Published: (2026)

Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
by: Fu, Yu, et al.
Published: (2024)

LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
by: Hu, Lianyu, et al.
Published: (2025)

ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression
by: Liu, Guangda, et al.
Published: (2024)

FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference
by: Zhao, Bingzhe, et al.
Published: (2025)

PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference
by: Li, Ye, et al.
Published: (2024)

Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
by: Liu, Xuyang, et al.
Published: (2025)

Training Transformers for KV Cache Compressibility
by: Gelberg, Yoav, et al.
Published: (2026)

Lossless KV Cache Compression to 2%
by: Yang, Zhen, et al.
Published: (2024)

KVSculpt: KV Cache Compression as Distillation
by: Jiang, Bo, et al.
Published: (2026)

Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity
by: Ma, Da, et al.
Published: (2024)

EvolKV: Evolutionary KV Cache Compression for LLM Inference
by: Yu, Bohan, et al.
Published: (2025)