Saved in:
| Main Authors: | Kim, Buseong, Gwon, Heejun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.10900 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
OjaKV: Context-Aware Online Low-Rank KV Cache Compression
by: Zhu, Yuxuan, et al.
Published: (2025)
by: Zhu, Yuxuan, et al.
Published: (2025)
Training-Free Exponential Context Extension via Cascading KV Cache
by: Willette, Jeffrey, et al.
Published: (2024)
by: Willette, Jeffrey, et al.
Published: (2024)
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning
by: Corallo, Giulio, et al.
Published: (2025)
by: Corallo, Giulio, et al.
Published: (2025)
The Pitfalls of KV Cache Compression
by: Chen, Alex, et al.
Published: (2025)
by: Chen, Alex, et al.
Published: (2025)
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
by: Yang, June Yong, et al.
Published: (2024)
by: Yang, June Yong, et al.
Published: (2024)
CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
by: Wang, Yixuan, et al.
Published: (2025)
by: Wang, Yixuan, et al.
Published: (2025)
CompilerKV: Risk-Adaptive KV Compression via Offline Experience Compilation
by: Yang, Ning, et al.
Published: (2026)
by: Yang, Ning, et al.
Published: (2026)
SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection
by: Shukla, Shikhar
Published: (2026)
by: Shukla, Shikhar
Published: (2026)
EliteKV: Scalable KV Cache Compression via RoPE Frequency Selection and Joint Low-Rank Projection
by: Zhou, Yuhao, et al.
Published: (2025)
by: Zhou, Yuhao, et al.
Published: (2025)
ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression
by: Liu, Guangda, et al.
Published: (2024)
by: Liu, Guangda, et al.
Published: (2024)
SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference
by: Zhao, Yi, et al.
Published: (2025)
by: Zhao, Yi, et al.
Published: (2025)
MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
by: Lin, Bokai, et al.
Published: (2024)
by: Lin, Bokai, et al.
Published: (2024)
Hurwitz Quaternion Multiplicative Quantization for KV Cache Compression
by: Swain, Kabir, et al.
Published: (2026)
by: Swain, Kabir, et al.
Published: (2026)
Palu: Compressing KV-Cache with Low-Rank Projection
by: Chang, Chi-Chih, et al.
Published: (2024)
by: Chang, Chi-Chih, et al.
Published: (2024)
Revisiting Multimodal KV Cache Compression: A Frequency-Domain-Guided Outlier-KV-Aware Approach
by: Yang, Yaoxin, et al.
Published: (2025)
by: Yang, Yaoxin, et al.
Published: (2025)
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
by: Yan, Xianglong, et al.
Published: (2025)
by: Yan, Xianglong, et al.
Published: (2025)
FibQuant: Universal Vector Quantization for Random-Access KV-Cache Compression
by: Lee, Namyoon, et al.
Published: (2026)
by: Lee, Namyoon, et al.
Published: (2026)
KVSculpt: KV Cache Compression as Distillation
by: Jiang, Bo, et al.
Published: (2026)
by: Jiang, Bo, et al.
Published: (2026)
Uncovering the Representation Geometry of Minimal Cores in Overcomplete Reasoning Traces
by: Chowdhury, Sanjoy, et al.
Published: (2026)
by: Chowdhury, Sanjoy, et al.
Published: (2026)
MixKVQ: Query-Aware Mixed-Precision KV Cache Quantization for Long-Context Reasoning
by: Zhang, Tao, et al.
Published: (2025)
by: Zhang, Tao, et al.
Published: (2025)
When Does Value-Aware KV Eviction Help? A Fixed-Contract Diagnostic for Non-Monotone Cache Compression
by: Zhang, Ruijie, et al.
Published: (2026)
by: Zhang, Ruijie, et al.
Published: (2026)
How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers
by: Wang, Xiao
Published: (2026)
by: Wang, Xiao
Published: (2026)
ManifoldKV: Training-Free KV Cache Compression via Euclidean Outlier Detection
by: Datta, Debajyoti, et al.
Published: (2026)
by: Datta, Debajyoti, et al.
Published: (2026)
KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference
by: Tian, Yuxuan, et al.
Published: (2025)
by: Tian, Yuxuan, et al.
Published: (2025)
Efficient Epistemic Uncertainty Estimation for Large Language Models via Knowledge Distillation
by: Park, Seonghyeon, et al.
Published: (2026)
by: Park, Seonghyeon, et al.
Published: (2026)
RAP: KV-Cache Compression via RoPE-Aligned Pruning
by: Xin, Jihao, et al.
Published: (2026)
by: Xin, Jihao, et al.
Published: (2026)
Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression
by: Zhang, Te, et al.
Published: (2025)
by: Zhang, Te, et al.
Published: (2025)
CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM
by: Li, Yubo, et al.
Published: (2026)
by: Li, Yubo, et al.
Published: (2026)
Polynomial Context-Truncation Sensitivity in Autoregressive Language Models: Sequential Wyner-Ziv Bounds for KV Cache Compression
by: Kim, Munsik
Published: (2026)
by: Kim, Munsik
Published: (2026)
Quantization Dominates Rank Reduction for KV-Cache Compression
by: Salfati, Samuel
Published: (2026)
by: Salfati, Samuel
Published: (2026)
Position as Probability: Self-Supervised Transformers that Think Past Their Training for Length Extrapolation
by: Lee, Philip Heejun
Published: (2025)
by: Lee, Philip Heejun
Published: (2025)
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
by: Ahn, Jinwoo, et al.
Published: (2026)
by: Ahn, Jinwoo, et al.
Published: (2026)
QUOKA: Query-Oriented KV Selection For Efficient LLM Prefill
by: Jones, Dalton, et al.
Published: (2026)
by: Jones, Dalton, et al.
Published: (2026)
ReasonCache: Accelerating Large Reasoning Model Serving through KV Cache Sharing
by: Chen, Kaiwen, et al.
Published: (2025)
by: Chen, Kaiwen, et al.
Published: (2025)
Locality-Aware Redundancy Pruning for LLM Depth Compression
by: Yun, Vincent-Daniel, et al.
Published: (2026)
by: Yun, Vincent-Daniel, et al.
Published: (2026)
RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction
by: Liu, Sihao, et al.
Published: (2026)
by: Liu, Sihao, et al.
Published: (2026)
TurboAngle: Near-Lossless KV Cache Compression via Uniform Angle Quantization
by: Patel, Dipkumar
Published: (2026)
by: Patel, Dipkumar
Published: (2026)
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
by: Dong, Harry, et al.
Published: (2024)
by: Dong, Harry, et al.
Published: (2024)
RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse
by: Liu, Mingrui, et al.
Published: (2026)
by: Liu, Mingrui, et al.
Published: (2026)
Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
by: Saxena, Utkarsh, et al.
Published: (2024)
by: Saxena, Utkarsh, et al.
Published: (2024)
Similar Items
-
OjaKV: Context-Aware Online Low-Rank KV Cache Compression
by: Zhu, Yuxuan, et al.
Published: (2025) -
Training-Free Exponential Context Extension via Cascading KV Cache
by: Willette, Jeffrey, et al.
Published: (2024) -
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning
by: Corallo, Giulio, et al.
Published: (2025) -
The Pitfalls of KV Cache Compression
by: Chen, Alex, et al.
Published: (2025) -
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
by: Yang, June Yong, et al.
Published: (2024)