Saved in:
| Main Authors: | Dong, Shichen, Cheng, Wen, Qin, Jiayu, Wang, Wei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.04643 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences
by: Qin, Ziran, et al.
Published: (2025)
by: Qin, Ziran, et al.
Published: (2025)
KV-CoRE: Benchmarking Data-Dependent Low-Rank Compressibility of KV-Caches in LLMs
by: Chen, Jian, et al.
Published: (2026)
by: Chen, Jian, et al.
Published: (2026)
WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference
by: Zuo, Youhui, et al.
Published: (2025)
by: Zuo, Youhui, et al.
Published: (2025)
AKVQ-VL: Attention-Aware KV Cache Adaptive 2-Bit Quantization for Vision-Language Models
by: Su, Zunhai, et al.
Published: (2025)
by: Su, Zunhai, et al.
Published: (2025)
Accurate KV Cache Quantization with Outlier Tokens Tracing
by: Su, Yi, et al.
Published: (2025)
by: Su, Yi, et al.
Published: (2025)
RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations
by: Su, Zunhai, et al.
Published: (2025)
by: Su, Zunhai, et al.
Published: (2025)
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
by: Sun, Hanshi, et al.
Published: (2024)
by: Sun, Hanshi, et al.
Published: (2024)
VecInfer: Efficient LLM Inference with Low-Bit KV Cache via Outlier-Suppressed Vector Quantization
by: Yao, Dingyu, et al.
Published: (2025)
by: Yao, Dingyu, et al.
Published: (2025)
QAQ: Bidirectional Semantic Coherence for Selecting High-Quality Synthetic Code Instructions
by: Lei, Jiayin, et al.
Published: (2026)
by: Lei, Jiayin, et al.
Published: (2026)
SQuat: Subspace-orthogonal KV Cache Quantization
by: Wang, Hao, et al.
Published: (2025)
by: Wang, Hao, et al.
Published: (2025)
Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression
by: Liu, Peiyu, et al.
Published: (2024)
by: Liu, Peiyu, et al.
Published: (2024)
R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
by: Cai, Zefan, et al.
Published: (2025)
by: Cai, Zefan, et al.
Published: (2025)
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
by: Feng, Yuan, et al.
Published: (2024)
by: Feng, Yuan, et al.
Published: (2024)
CommVQ: Commutative Vector Quantization for KV Cache Compression
by: Li, Junyan, et al.
Published: (2025)
by: Li, Junyan, et al.
Published: (2025)
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
by: Liu, Xiang, et al.
Published: (2025)
by: Liu, Xiang, et al.
Published: (2025)
DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs
by: Zhou, Xiabin, et al.
Published: (2024)
by: Zhou, Xiabin, et al.
Published: (2024)
EvolKV: Evolutionary KV Cache Compression for LLM Inference
by: Yu, Bohan, et al.
Published: (2025)
by: Yu, Bohan, et al.
Published: (2025)
Quantization Dominates Rank Reduction for KV-Cache Compression
by: Salfati, Samuel
Published: (2026)
by: Salfati, Samuel
Published: (2026)
One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache
by: Lu, Liming, et al.
Published: (2026)
by: Lu, Liming, et al.
Published: (2026)
Efficient Long-Context LLM Inference via KV Cache Clustering
by: Hu, Jie, et al.
Published: (2025)
by: Hu, Jie, et al.
Published: (2025)
NQKV: A KV Cache Quantization Scheme Based on Normal Distribution Characteristics
by: Cai, Zhihang, et al.
Published: (2025)
by: Cai, Zhihang, et al.
Published: (2025)
Taming the Fragility of KV Cache Eviction in LLM Inference
by: Feng, Yuan, et al.
Published: (2025)
by: Feng, Yuan, et al.
Published: (2025)
KVSink: Understanding and Enhancing the Preservation of Attention Sinks in KV Cache Quantization for LLMs
by: Su, Zunhai, et al.
Published: (2025)
by: Su, Zunhai, et al.
Published: (2025)
PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs
by: Liu, Tengxuan, et al.
Published: (2025)
by: Liu, Tengxuan, et al.
Published: (2025)
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference
by: Yang, Dongjie, et al.
Published: (2024)
by: Yang, Dongjie, et al.
Published: (2024)
dKV-Cache: The Cache for Diffusion Language Models
by: Ma, Xinyin, et al.
Published: (2025)
by: Ma, Xinyin, et al.
Published: (2025)
G-KV: Decoding-Time KV Cache Eviction with Global Attention
by: Liao, Mengqi, et al.
Published: (2025)
by: Liao, Mengqi, et al.
Published: (2025)
TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference
by: Dzikanyanga, Gradwell, et al.
Published: (2026)
by: Dzikanyanga, Gradwell, et al.
Published: (2026)
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
by: Cai, Zefan, et al.
Published: (2024)
by: Cai, Zefan, et al.
Published: (2024)
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025)
by: Liu, Guangda, et al.
Published: (2025)
WindowQuant: Mixed-Precision KV Cache Quantization based on Window-Level Similarity for VLMs Inference Optimization
by: Tao, Wei, et al.
Published: (2026)
by: Tao, Wei, et al.
Published: (2026)
KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference
by: Tian, Yuxuan, et al.
Published: (2025)
by: Tian, Yuxuan, et al.
Published: (2025)
DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing
by: Guo, Jinyu, et al.
Published: (2026)
by: Guo, Jinyu, et al.
Published: (2026)
XQuant: Achieving Ultra-Low Bit KV Cache Quantization with Cross-Layer Compression
by: Yang, Haoqi, et al.
Published: (2025)
by: Yang, Haoqi, et al.
Published: (2025)
QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead
by: Zandieh, Amir, et al.
Published: (2024)
by: Zandieh, Amir, et al.
Published: (2024)
HyperRAG: Enhancing Quality-Efficiency Tradeoffs in Retrieval-Augmented Generation with Reranker KV-Cache Reuse
by: An, Yuwei, et al.
Published: (2025)
by: An, Yuwei, et al.
Published: (2025)
SABlock: Semantic-Aware KV Cache Eviction with Adaptive Compression Block Size
by: Chen, Jinhan, et al.
Published: (2025)
by: Chen, Jinhan, et al.
Published: (2025)
DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM Inference
by: Dehghanighobadi, Zahra, et al.
Published: (2026)
by: Dehghanighobadi, Zahra, et al.
Published: (2026)
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
by: Liu, Zirui, et al.
Published: (2024)
by: Liu, Zirui, et al.
Published: (2024)
KVzap: Fast, Adaptive, and Faithful KV Cache Pruning
by: Jegou, Simon, et al.
Published: (2026)
by: Jegou, Simon, et al.
Published: (2026)
Similar Items
-
CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences
by: Qin, Ziran, et al.
Published: (2025) -
KV-CoRE: Benchmarking Data-Dependent Low-Rank Compressibility of KV-Caches in LLMs
by: Chen, Jian, et al.
Published: (2026) -
WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference
by: Zuo, Youhui, et al.
Published: (2025) -
AKVQ-VL: Attention-Aware KV Cache Adaptive 2-Bit Quantization for Vision-Language Models
by: Su, Zunhai, et al.
Published: (2025) -
Accurate KV Cache Quantization with Outlier Tokens Tracing
by: Su, Yi, et al.
Published: (2025)