Saved in:
| Main Authors: | He, Xingyang, Liu, Jie, Chen, Shaowei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.15113 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs
by: Zhou, Xiabin, et al.
Published: (2024)
by: Zhou, Xiabin, et al.
Published: (2024)
KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head
by: Rehg, Isaac
Published: (2024)
by: Rehg, Isaac
Published: (2024)
WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference
by: Zuo, Youhui, et al.
Published: (2025)
by: Zuo, Youhui, et al.
Published: (2025)
KV Cache Offloading for Context-Intensive Tasks
by: Bocharnikov, Andrey, et al.
Published: (2026)
by: Bocharnikov, Andrey, et al.
Published: (2026)
R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
by: Cai, Zefan, et al.
Published: (2025)
by: Cai, Zefan, et al.
Published: (2025)
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
by: Tang, Hanlin, et al.
Published: (2024)
by: Tang, Hanlin, et al.
Published: (2024)
KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding
by: Shi, Luohe, et al.
Published: (2025)
by: Shi, Luohe, et al.
Published: (2025)
G-KV: Decoding-Time KV Cache Eviction with Global Attention
by: Liao, Mengqi, et al.
Published: (2025)
by: Liao, Mengqi, et al.
Published: (2025)
AttentionPredictor: Temporal Patterns Matter for KV Cache Compression
by: Yang, Qingyue, et al.
Published: (2025)
by: Yang, Qingyue, et al.
Published: (2025)
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
by: Liu, Xiang, et al.
Published: (2025)
by: Liu, Xiang, et al.
Published: (2025)
EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse
by: Guo, Tianyu, et al.
Published: (2025)
by: Guo, Tianyu, et al.
Published: (2025)
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
by: Zhu, Yuxuan, et al.
Published: (2025)
by: Zhu, Yuxuan, et al.
Published: (2025)
Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache
by: Liu, Xiaoran, et al.
Published: (2025)
by: Liu, Xiaoran, et al.
Published: (2025)
EchoKV: Efficient KV Cache Compression via Similarity-Based Reconstruction
by: Ji, Shiyu, et al.
Published: (2026)
by: Ji, Shiyu, et al.
Published: (2026)
TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization
by: Yao, Dingyu, et al.
Published: (2025)
by: Yao, Dingyu, et al.
Published: (2025)
CriticalKV: Optimizing KV Cache Eviction from an Output Perturbation Perspective
by: Feng, Yuan, et al.
Published: (2025)
by: Feng, Yuan, et al.
Published: (2025)
SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging
by: Wu, Shunlong, et al.
Published: (2026)
by: Wu, Shunlong, et al.
Published: (2026)
Sparse Attention across Multiple-context KV Cache
by: Cao, Ziyi, et al.
Published: (2025)
by: Cao, Ziyi, et al.
Published: (2025)
Transactional Attention: Semantic Sponsorship for KV-Cache Retention
by: Basu, Abhinaba
Published: (2026)
by: Basu, Abhinaba
Published: (2026)
Which Heads Matter for Reasoning? RL-Guided KV Cache Compression
by: Du, Wenjie, et al.
Published: (2025)
by: Du, Wenjie, et al.
Published: (2025)
ForesightKV: Optimizing KV Cache Eviction for Reasoning Models by Learning Long-Term Contribution
by: Dong, Zican, et al.
Published: (2026)
by: Dong, Zican, et al.
Published: (2026)
DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing
by: Guo, Jinyu, et al.
Published: (2026)
by: Guo, Jinyu, et al.
Published: (2026)
In-context KV-Cache Eviction for LLMs via Attention-Gate
by: Zeng, Zihao, et al.
Published: (2024)
by: Zeng, Zihao, et al.
Published: (2024)
NestedKV: Nested Memory Routing for Long-Context KV Cache Compression
by: Chen, Hong, et al.
Published: (2026)
by: Chen, Hong, et al.
Published: (2026)
Efficient Long-Context LLM Inference via KV Cache Clustering
by: Hu, Jie, et al.
Published: (2025)
by: Hu, Jie, et al.
Published: (2025)
GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness
by: Huang, Kung-Hsiang, et al.
Published: (2025)
by: Huang, Kung-Hsiang, et al.
Published: (2025)
KV-CoRE: Benchmarking Data-Dependent Low-Rank Compressibility of KV-Caches in LLMs
by: Chen, Jian, et al.
Published: (2026)
by: Chen, Jian, et al.
Published: (2026)
SABlock: Semantic-Aware KV Cache Eviction with Adaptive Compression Block Size
by: Chen, Jinhan, et al.
Published: (2025)
by: Chen, Jinhan, et al.
Published: (2025)
EvolKV: Evolutionary KV Cache Compression for LLM Inference
by: Yu, Bohan, et al.
Published: (2025)
by: Yu, Bohan, et al.
Published: (2025)
Towards Threshold-Free KV Cache Pruning
by: Ni, Xuanfan, et al.
Published: (2025)
by: Ni, Xuanfan, et al.
Published: (2025)
Beyond KV Caching: Shared Attention for Efficient LLMs
by: Liao, Bingli, et al.
Published: (2024)
by: Liao, Bingli, et al.
Published: (2024)
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
by: Cai, Zefan, et al.
Published: (2024)
by: Cai, Zefan, et al.
Published: (2024)
DeltaKV: Residual-Based KV Cache Compression via Long-Range Similarity
by: Hao, Jitai, et al.
Published: (2026)
by: Hao, Jitai, et al.
Published: (2026)
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
by: Feng, Yuan, et al.
Published: (2024)
by: Feng, Yuan, et al.
Published: (2024)
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025)
by: Liu, Guangda, et al.
Published: (2025)
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
by: Sun, Hanshi, et al.
Published: (2024)
by: Sun, Hanshi, et al.
Published: (2024)
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning
by: Corallo, Giulio, et al.
Published: (2025)
by: Corallo, Giulio, et al.
Published: (2025)
RefreshKV: Updating Small KV Cache During Long-form Generation
by: Xu, Fangyuan, et al.
Published: (2024)
by: Xu, Fangyuan, et al.
Published: (2024)
SurfaceLogicKV: Surface and Logic Attention Behaviors are All You Need for Robust KV Cache Compression
by: Li, Mengjie, et al.
Published: (2025)
by: Li, Mengjie, et al.
Published: (2025)
AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models
by: Gu, Yifeng, et al.
Published: (2025)
by: Gu, Yifeng, et al.
Published: (2025)
Similar Items
-
DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs
by: Zhou, Xiabin, et al.
Published: (2024) -
KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head
by: Rehg, Isaac
Published: (2024) -
WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference
by: Zuo, Youhui, et al.
Published: (2025) -
KV Cache Offloading for Context-Intensive Tasks
by: Bocharnikov, Andrey, et al.
Published: (2026) -
R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
by: Cai, Zefan, et al.
Published: (2025)