:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	He, Xingyang, Liu, Jie, Chen, Shaowei
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2501.15113
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs
by: Zhou, Xiabin, et al.
Published: (2024)

KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head
by: Rehg, Isaac
Published: (2024)

WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference
by: Zuo, Youhui, et al.
Published: (2025)

KV Cache Offloading for Context-Intensive Tasks
by: Bocharnikov, Andrey, et al.
Published: (2026)

R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
by: Cai, Zefan, et al.
Published: (2025)

RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
by: Tang, Hanlin, et al.
Published: (2024)

KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding
by: Shi, Luohe, et al.
Published: (2025)

G-KV: Decoding-Time KV Cache Eviction with Global Attention
by: Liao, Mengqi, et al.
Published: (2025)

AttentionPredictor: Temporal Patterns Matter for KV Cache Compression
by: Yang, Qingyue, et al.
Published: (2025)

ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
by: Liu, Xiang, et al.
Published: (2025)

EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse
by: Guo, Tianyu, et al.
Published: (2025)

SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
by: Zhu, Yuxuan, et al.
Published: (2025)

Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache
by: Liu, Xiaoran, et al.
Published: (2025)

EchoKV: Efficient KV Cache Compression via Similarity-Based Reconstruction
by: Ji, Shiyu, et al.
Published: (2026)

TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization
by: Yao, Dingyu, et al.
Published: (2025)

CriticalKV: Optimizing KV Cache Eviction from an Output Perturbation Perspective
by: Feng, Yuan, et al.
Published: (2025)

SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging
by: Wu, Shunlong, et al.
Published: (2026)

Sparse Attention across Multiple-context KV Cache
by: Cao, Ziyi, et al.
Published: (2025)

Transactional Attention: Semantic Sponsorship for KV-Cache Retention
by: Basu, Abhinaba
Published: (2026)

Which Heads Matter for Reasoning? RL-Guided KV Cache Compression
by: Du, Wenjie, et al.
Published: (2025)

ForesightKV: Optimizing KV Cache Eviction for Reasoning Models by Learning Long-Term Contribution
by: Dong, Zican, et al.
Published: (2026)

DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing
by: Guo, Jinyu, et al.
Published: (2026)

In-context KV-Cache Eviction for LLMs via Attention-Gate
by: Zeng, Zihao, et al.
Published: (2024)

NestedKV: Nested Memory Routing for Long-Context KV Cache Compression
by: Chen, Hong, et al.
Published: (2026)

Efficient Long-Context LLM Inference via KV Cache Clustering
by: Hu, Jie, et al.
Published: (2025)

GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness
by: Huang, Kung-Hsiang, et al.
Published: (2025)

KV-CoRE: Benchmarking Data-Dependent Low-Rank Compressibility of KV-Caches in LLMs
by: Chen, Jian, et al.
Published: (2026)

SABlock: Semantic-Aware KV Cache Eviction with Adaptive Compression Block Size
by: Chen, Jinhan, et al.
Published: (2025)

EvolKV: Evolutionary KV Cache Compression for LLM Inference
by: Yu, Bohan, et al.
Published: (2025)

Towards Threshold-Free KV Cache Pruning
by: Ni, Xuanfan, et al.
Published: (2025)

Beyond KV Caching: Shared Attention for Efficient LLMs
by: Liao, Bingli, et al.
Published: (2024)

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
by: Cai, Zefan, et al.
Published: (2024)

DeltaKV: Residual-Based KV Cache Compression via Long-Range Similarity
by: Hao, Jitai, et al.
Published: (2026)

Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
by: Feng, Yuan, et al.
Published: (2024)

FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025)

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
by: Sun, Hanshi, et al.
Published: (2024)

Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning
by: Corallo, Giulio, et al.
Published: (2025)

RefreshKV: Updating Small KV Cache During Long-form Generation
by: Xu, Fangyuan, et al.
Published: (2024)

SurfaceLogicKV: Surface and Logic Attention Behaviors are All You Need for Robust KV Cache Compression
by: Li, Mengjie, et al.
Published: (2025)

AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models
by: Gu, Yifeng, et al.
Published: (2025)