:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kim, Buseong, Gwon, Heejun
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2604.10900
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

OjaKV: Context-Aware Online Low-Rank KV Cache Compression
by: Zhu, Yuxuan, et al.
Published: (2025)

Training-Free Exponential Context Extension via Cascading KV Cache
by: Willette, Jeffrey, et al.
Published: (2024)

Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning
by: Corallo, Giulio, et al.
Published: (2025)

The Pitfalls of KV Cache Compression
by: Chen, Alex, et al.
Published: (2025)

No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
by: Yang, June Yong, et al.
Published: (2024)

CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
by: Wang, Yixuan, et al.
Published: (2025)

CompilerKV: Risk-Adaptive KV Compression via Offline Experience Compilation
by: Yang, Ning, et al.
Published: (2026)

SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection
by: Shukla, Shikhar
Published: (2026)

EliteKV: Scalable KV Cache Compression via RoPE Frequency Selection and Joint Low-Rank Projection
by: Zhou, Yuhao, et al.
Published: (2025)

ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression
by: Liu, Guangda, et al.
Published: (2024)

SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference
by: Zhao, Yi, et al.
Published: (2025)

MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
by: Lin, Bokai, et al.
Published: (2024)

Hurwitz Quaternion Multiplicative Quantization for KV Cache Compression
by: Swain, Kabir, et al.
Published: (2026)

Palu: Compressing KV-Cache with Low-Rank Projection
by: Chang, Chi-Chih, et al.
Published: (2024)

Revisiting Multimodal KV Cache Compression: A Frequency-Domain-Guided Outlier-KV-Aware Approach
by: Yang, Yaoxin, et al.
Published: (2025)

ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
by: Yan, Xianglong, et al.
Published: (2025)

FibQuant: Universal Vector Quantization for Random-Access KV-Cache Compression
by: Lee, Namyoon, et al.
Published: (2026)

KVSculpt: KV Cache Compression as Distillation
by: Jiang, Bo, et al.
Published: (2026)

Uncovering the Representation Geometry of Minimal Cores in Overcomplete Reasoning Traces
by: Chowdhury, Sanjoy, et al.
Published: (2026)

MixKVQ: Query-Aware Mixed-Precision KV Cache Quantization for Long-Context Reasoning
by: Zhang, Tao, et al.
Published: (2025)

When Does Value-Aware KV Eviction Help? A Fixed-Contract Diagnostic for Non-Monotone Cache Compression
by: Zhang, Ruijie, et al.
Published: (2026)

How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers
by: Wang, Xiao
Published: (2026)

ManifoldKV: Training-Free KV Cache Compression via Euclidean Outlier Detection
by: Datta, Debajyoti, et al.
Published: (2026)

KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference
by: Tian, Yuxuan, et al.
Published: (2025)

Efficient Epistemic Uncertainty Estimation for Large Language Models via Knowledge Distillation
by: Park, Seonghyeon, et al.
Published: (2026)

RAP: KV-Cache Compression via RoPE-Aligned Pruning
by: Xin, Jihao, et al.
Published: (2026)

Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression
by: Zhang, Te, et al.
Published: (2025)

CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM
by: Li, Yubo, et al.
Published: (2026)

Polynomial Context-Truncation Sensitivity in Autoregressive Language Models: Sequential Wyner-Ziv Bounds for KV Cache Compression
by: Kim, Munsik
Published: (2026)

Quantization Dominates Rank Reduction for KV-Cache Compression
by: Salfati, Samuel
Published: (2026)

Position as Probability: Self-Supervised Transformers that Think Past Their Training for Length Extrapolation
by: Lee, Philip Heejun
Published: (2025)

LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
by: Ahn, Jinwoo, et al.
Published: (2026)

QUOKA: Query-Oriented KV Selection For Efficient LLM Prefill
by: Jones, Dalton, et al.
Published: (2026)

ReasonCache: Accelerating Large Reasoning Model Serving through KV Cache Sharing
by: Chen, Kaiwen, et al.
Published: (2025)

Locality-Aware Redundancy Pruning for LLM Depth Compression
by: Yun, Vincent-Daniel, et al.
Published: (2026)

RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction
by: Liu, Sihao, et al.
Published: (2026)

TurboAngle: Near-Lossless KV Cache Compression via Uniform Angle Quantization
by: Patel, Dipkumar
Published: (2026)

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
by: Dong, Harry, et al.
Published: (2024)

RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse
by: Liu, Mingrui, et al.
Published: (2026)

Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
by: Saxena, Utkarsh, et al.
Published: (2024)