Saved in:
| Main Authors: | Li, Yingxin, Li, Ye, Meng, Yuan, Ma, Xinzhu, Geng, Zihan, Xia, Shutao, Wang, Zhi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.08521 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing
by: Liu, Minghui, et al.
Published: (2024)
by: Liu, Minghui, et al.
Published: (2024)
Block-wise Adaptive Caching for Accelerating Diffusion Policy
by: Ji, Kangye, et al.
Published: (2025)
by: Ji, Kangye, et al.
Published: (2025)
On the Limits of Learned Importance Scoring for KV Cache Compression
by: Steele, Brady
Published: (2026)
by: Steele, Brady
Published: (2026)
Learning to Evict from Key-Value Cache
by: Moschella, Luca, et al.
Published: (2026)
by: Moschella, Luca, et al.
Published: (2026)
KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head
by: Rehg, Isaac
Published: (2024)
by: Rehg, Isaac
Published: (2024)
Which Heads Matter for Reasoning? RL-Guided KV Cache Compression
by: Du, Wenjie, et al.
Published: (2025)
by: Du, Wenjie, et al.
Published: (2025)
SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging
by: Wu, Shunlong, et al.
Published: (2026)
by: Wu, Shunlong, et al.
Published: (2026)
CaliDrop: KV Cache Compression with Calibration
by: Su, Yi, et al.
Published: (2025)
by: Su, Yi, et al.
Published: (2025)
KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference
by: Tian, Yuxuan, et al.
Published: (2025)
by: Tian, Yuxuan, et al.
Published: (2025)
Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression
by: Zhang, Te, et al.
Published: (2025)
by: Zhang, Te, et al.
Published: (2025)
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
by: Yan, Xianglong, et al.
Published: (2025)
by: Yan, Xianglong, et al.
Published: (2025)
SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration
by: Li, Ye, et al.
Published: (2025)
by: Li, Ye, et al.
Published: (2025)
DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs
by: Zhou, Xiabin, et al.
Published: (2024)
by: Zhou, Xiabin, et al.
Published: (2024)
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
by: Tang, Hanlin, et al.
Published: (2024)
by: Tang, Hanlin, et al.
Published: (2024)
Head-Aware KV Cache Compression for Efficient Visual Autoregressive Modeling
by: Qin, Ziran, et al.
Published: (2025)
by: Qin, Ziran, et al.
Published: (2025)
ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
by: Ramachandran, Akshat, et al.
Published: (2025)
by: Ramachandran, Akshat, et al.
Published: (2025)
Effectively Compress KV Heads for LLM
by: Yu, Hao, et al.
Published: (2024)
by: Yu, Hao, et al.
Published: (2024)
The Pitfalls of KV Cache Compression
by: Chen, Alex, et al.
Published: (2025)
by: Chen, Alex, et al.
Published: (2025)
KVReviver: Reversible KV Cache Compression with Sketch-Based Token Reconstruction
by: Yuan, Aomufei, et al.
Published: (2025)
by: Yuan, Aomufei, et al.
Published: (2025)
CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
by: Wang, Yixuan, et al.
Published: (2025)
by: Wang, Yixuan, et al.
Published: (2025)
LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
by: Shen, Yiqun, et al.
Published: (2025)
by: Shen, Yiqun, et al.
Published: (2025)
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks
by: Wang, Zheng, et al.
Published: (2024)
by: Wang, Zheng, et al.
Published: (2024)
Graph-Guided Adaptive Channel Elimination for KV Cache Compression
by: Tong, Enwei, et al.
Published: (2026)
by: Tong, Enwei, et al.
Published: (2026)
Adaptive KV-Cache Compression without Manually Setting Budget
by: Tang, Chenxia, et al.
Published: (2025)
by: Tang, Chenxia, et al.
Published: (2025)
R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
by: Cai, Zefan, et al.
Published: (2025)
by: Cai, Zefan, et al.
Published: (2025)
LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction
by: Zhou, Enshuai, et al.
Published: (2026)
by: Zhou, Enshuai, et al.
Published: (2026)
AttentionPredictor: Temporal Patterns Matter for KV Cache Compression
by: Yang, Qingyue, et al.
Published: (2025)
by: Yang, Qingyue, et al.
Published: (2025)
KV-CoRE: Benchmarking Data-Dependent Low-Rank Compressibility of KV-Caches in LLMs
by: Chen, Jian, et al.
Published: (2026)
by: Chen, Jian, et al.
Published: (2026)
Pyramid Cache: Layer-Adaptive KV Cache Compression with Signature-Based Cold Storage
by: Sergio dj
Published: (2026)
by: Sergio dj
Published: (2026)
Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
by: Fu, Yu, et al.
Published: (2024)
by: Fu, Yu, et al.
Published: (2024)
LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
by: Hu, Lianyu, et al.
Published: (2025)
by: Hu, Lianyu, et al.
Published: (2025)
ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression
by: Liu, Guangda, et al.
Published: (2024)
by: Liu, Guangda, et al.
Published: (2024)
FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference
by: Zhao, Bingzhe, et al.
Published: (2025)
by: Zhao, Bingzhe, et al.
Published: (2025)
PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference
by: Li, Ye, et al.
Published: (2024)
by: Li, Ye, et al.
Published: (2024)
Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
by: Liu, Xuyang, et al.
Published: (2025)
by: Liu, Xuyang, et al.
Published: (2025)
Training Transformers for KV Cache Compressibility
by: Gelberg, Yoav, et al.
Published: (2026)
by: Gelberg, Yoav, et al.
Published: (2026)
Lossless KV Cache Compression to 2%
by: Yang, Zhen, et al.
Published: (2024)
by: Yang, Zhen, et al.
Published: (2024)
KVSculpt: KV Cache Compression as Distillation
by: Jiang, Bo, et al.
Published: (2026)
by: Jiang, Bo, et al.
Published: (2026)
Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity
by: Ma, Da, et al.
Published: (2024)
by: Ma, Da, et al.
Published: (2024)
EvolKV: Evolutionary KV Cache Compression for LLM Inference
by: Yu, Bohan, et al.
Published: (2025)
by: Yu, Bohan, et al.
Published: (2025)
Similar Items
-
HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing
by: Liu, Minghui, et al.
Published: (2024) -
Block-wise Adaptive Caching for Accelerating Diffusion Policy
by: Ji, Kangye, et al.
Published: (2025) -
On the Limits of Learned Importance Scoring for KV Cache Compression
by: Steele, Brady
Published: (2026) -
Learning to Evict from Key-Value Cache
by: Moschella, Luca, et al.
Published: (2026) -
KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head
by: Rehg, Isaac
Published: (2024)