Saved in:
| Main Authors: | Shen, Hanzhang, Wu, Haoran, Zhao, Yiren, Mullins, Robert |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.17170 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization
by: Tan, Yifan, et al.
Published: (2024)
by: Tan, Yifan, et al.
Published: (2024)
Leyline: KV Cache Directives for Agentic Inference
by: Ma, Bole, et al.
Published: (2026)
by: Ma, Bole, et al.
Published: (2026)
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
by: Hooper, Coleman, et al.
Published: (2024)
by: Hooper, Coleman, et al.
Published: (2024)
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025)
by: Liu, Guangda, et al.
Published: (2025)
EvolKV: Evolutionary KV Cache Compression for LLM Inference
by: Yu, Bohan, et al.
Published: (2025)
by: Yu, Bohan, et al.
Published: (2025)
MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference
by: Li, Kunxi, et al.
Published: (2025)
by: Li, Kunxi, et al.
Published: (2025)
AsymKV: Enabling 1-Bit Quantization of KV Cache with Layer-Wise Asymmetric Quantization Configurations
by: Tao, Qian, et al.
Published: (2024)
by: Tao, Qian, et al.
Published: (2024)
SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference
by: Zhao, Yi, et al.
Published: (2025)
by: Zhao, Yi, et al.
Published: (2025)
MomentKV: Closing the Directional Gap in KV Cache Eviction for Long-Context Inference
by: Li, Yu, et al.
Published: (2026)
by: Li, Yu, et al.
Published: (2026)
Kitty: Accurate and Efficient 2-bit KV Cache Quantization with Dynamic Channel-wise Precision Boost
by: Xia, Haojun, et al.
Published: (2025)
by: Xia, Haojun, et al.
Published: (2025)
ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
by: He, Yefei, et al.
Published: (2024)
by: He, Yefei, et al.
Published: (2024)
VeriCache: Turning Lossy KV Cache into Lossless LLM Inference
by: Yao, Jiayi, et al.
Published: (2026)
by: Yao, Jiayi, et al.
Published: (2026)
Hurwitz Quaternion Multiplicative Quantization for KV Cache Compression
by: Swain, Kabir, et al.
Published: (2026)
by: Swain, Kabir, et al.
Published: (2026)
PolarQuant: Quantizing KV Caches with Polar Transformation
by: Han, Insu, et al.
Published: (2025)
by: Han, Insu, et al.
Published: (2025)
SQuat: Subspace-orthogonal KV Cache Quantization
by: Wang, Hao, et al.
Published: (2025)
by: Wang, Hao, et al.
Published: (2025)
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
by: Sun, Hanshi, et al.
Published: (2024)
by: Sun, Hanshi, et al.
Published: (2024)
KV Pareto: Systems-Level Optimization of KV Cache and Model Compression for Long Context Inference
by: Gokhale, Sai, et al.
Published: (2025)
by: Gokhale, Sai, et al.
Published: (2025)
RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
by: Behnam, Payman, et al.
Published: (2025)
by: Behnam, Payman, et al.
Published: (2025)
KVmix: Gradient-Based Layer Importance-Aware Mixed-Precision Quantization for KV Cache
by: Li, Fei, et al.
Published: (2025)
by: Li, Fei, et al.
Published: (2025)
Quantization Dominates Rank Reduction for KV-Cache Compression
by: Salfati, Samuel
Published: (2026)
by: Salfati, Samuel
Published: (2026)
RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations
by: Su, Zunhai, et al.
Published: (2025)
by: Su, Zunhai, et al.
Published: (2025)
OjaKV: Context-Aware Online Low-Rank KV Cache Compression
by: Zhu, Yuxuan, et al.
Published: (2025)
by: Zhu, Yuxuan, et al.
Published: (2025)
MixKVQ: Query-Aware Mixed-Precision KV Cache Quantization for Long-Context Reasoning
by: Zhang, Tao, et al.
Published: (2025)
by: Zhang, Tao, et al.
Published: (2025)
KV Cache Offloading for Context-Intensive Tasks
by: Bocharnikov, Andrey, et al.
Published: (2026)
by: Bocharnikov, Andrey, et al.
Published: (2026)
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
by: Zhu, Yuxuan, et al.
Published: (2025)
by: Zhu, Yuxuan, et al.
Published: (2025)
KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference
by: Tian, Yuxuan, et al.
Published: (2025)
by: Tian, Yuxuan, et al.
Published: (2025)
KV-CAR: KV Cache Compression using Autoencoders and KV Reuse in Large Language Models
by: Roy, Sourjya, et al.
Published: (2025)
by: Roy, Sourjya, et al.
Published: (2025)
KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference
by: Nadali, Alireza, et al.
Published: (2026)
by: Nadali, Alireza, et al.
Published: (2026)
KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
by: Li, Xing, et al.
Published: (2025)
by: Li, Xing, et al.
Published: (2025)
LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference
by: Liu, Yuhan, et al.
Published: (2025)
by: Liu, Yuhan, et al.
Published: (2025)
CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM
by: Li, Yubo, et al.
Published: (2026)
by: Li, Yubo, et al.
Published: (2026)
Palu: Compressing KV-Cache with Low-Rank Projection
by: Chang, Chi-Chih, et al.
Published: (2024)
by: Chang, Chi-Chih, et al.
Published: (2024)
Inference-Time Hyper-Scaling with KV Cache Compression
by: Łańcucki, Adrian, et al.
Published: (2025)
by: Łańcucki, Adrian, et al.
Published: (2025)
KVLinC : KV Cache Quantization with Hadamard Rotation and Linear Correction
by: Saxena, Utkarsh, et al.
Published: (2025)
by: Saxena, Utkarsh, et al.
Published: (2025)
RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory
by: Zuo, Fei, et al.
Published: (2026)
by: Zuo, Fei, et al.
Published: (2026)
LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences
by: Wu, Wenbo, et al.
Published: (2025)
by: Wu, Wenbo, et al.
Published: (2025)
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
by: Zhang, Tianyi, et al.
Published: (2024)
by: Zhang, Tianyi, et al.
Published: (2024)
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
by: Yang, June Yong, et al.
Published: (2024)
by: Yang, June Yong, et al.
Published: (2024)
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
by: Zhao, Youpeng, et al.
Published: (2024)
by: Zhao, Youpeng, et al.
Published: (2024)
MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
by: Sharma, Akshat, et al.
Published: (2024)
by: Sharma, Akshat, et al.
Published: (2024)
Similar Items
-
AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization
by: Tan, Yifan, et al.
Published: (2024) -
Leyline: KV Cache Directives for Agentic Inference
by: Ma, Bole, et al.
Published: (2026) -
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
by: Hooper, Coleman, et al.
Published: (2024) -
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025) -
EvolKV: Evolutionary KV Cache Compression for LLM Inference
by: Yu, Bohan, et al.
Published: (2025)