Saved in:
| Main Authors: | Chodavarapu, Ranjith, Xu, Lei |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.15409 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Uncertainty-Aware Wildfire Smoke Density Classification from Satellite Imagery via CBAM-Augmented EfficientNet with Evidential Deep Learning
by: Chodavarapu, Ranjith
Published: (2026)
by: Chodavarapu, Ranjith
Published: (2026)
Probabilistic Dating of Historical Manuscripts via Evidential Deep Regression on Visual Script Features
by: Chodavarapu, Ranjith
Published: (2026)
by: Chodavarapu, Ranjith
Published: (2026)
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025)
by: Liu, Guangda, et al.
Published: (2025)
SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference
by: Zhao, Yi, et al.
Published: (2025)
by: Zhao, Yi, et al.
Published: (2025)
Online Scheduling for LLM Inference with KV Cache Constraints
by: Jaillet, Patrick, et al.
Published: (2025)
by: Jaillet, Patrick, et al.
Published: (2025)
Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching
by: Dong, Yanhao, et al.
Published: (2025)
by: Dong, Yanhao, et al.
Published: (2025)
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
by: Zhu, Yuxuan, et al.
Published: (2025)
by: Zhu, Yuxuan, et al.
Published: (2025)
KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference
by: Tian, Yuxuan, et al.
Published: (2025)
by: Tian, Yuxuan, et al.
Published: (2025)
Leyline: KV Cache Directives for Agentic Inference
by: Ma, Bole, et al.
Published: (2026)
by: Ma, Bole, et al.
Published: (2026)
The Pitfalls of KV Cache Compression
by: Chen, Alex, et al.
Published: (2025)
by: Chen, Alex, et al.
Published: (2025)
The Residual Stream Is All You Need: On the Redundancy of the KV Cache in Transformer Inference
by: Qasim, Kaleem Ullah, et al.
Published: (2026)
by: Qasim, Kaleem Ullah, et al.
Published: (2026)
Defeating the Training-Inference Mismatch via FP16
by: Qi, Penghui, et al.
Published: (2025)
by: Qi, Penghui, et al.
Published: (2025)
KV Cache Transform Coding for Compact Storage in LLM Inference
by: Staniszewski, Konrad, et al.
Published: (2025)
by: Staniszewski, Konrad, et al.
Published: (2025)
ReasonCache: Accelerating Large Reasoning Model Serving through KV Cache Sharing
by: Chen, Kaiwen, et al.
Published: (2025)
by: Chen, Kaiwen, et al.
Published: (2025)
CacheClip: Accelerating RAG with Effective KV Cache Reuse
by: Yang, Bin, et al.
Published: (2025)
by: Yang, Bin, et al.
Published: (2025)
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
by: Dong, Harry, et al.
Published: (2024)
by: Dong, Harry, et al.
Published: (2024)
MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference
by: Li, Kunxi, et al.
Published: (2025)
by: Li, Kunxi, et al.
Published: (2025)
KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs
by: Chen, Chuangtao, et al.
Published: (2026)
by: Chen, Chuangtao, et al.
Published: (2026)
CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
by: Wang, Yixuan, et al.
Published: (2025)
by: Wang, Yixuan, et al.
Published: (2025)
CoKV: Optimizing KV Cache Allocation via Cooperative Game
by: Sun, Qiheng, et al.
Published: (2025)
by: Sun, Qiheng, et al.
Published: (2025)
TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference
by: Dzikanyanga, Gradwell, et al.
Published: (2026)
by: Dzikanyanga, Gradwell, et al.
Published: (2026)
KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference
by: Nadali, Alireza, et al.
Published: (2026)
by: Nadali, Alireza, et al.
Published: (2026)
LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences
by: Wu, Wenbo, et al.
Published: (2025)
by: Wu, Wenbo, et al.
Published: (2025)
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
by: Zhao, Youpeng, et al.
Published: (2024)
by: Zhao, Youpeng, et al.
Published: (2024)
KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing
by: Yang, Yifei, et al.
Published: (2024)
by: Yang, Yifei, et al.
Published: (2024)
RelayCaching: Accelerating LLM Collaboration via Decoding KV Cache Reuse
by: Geng, Yingsheng, et al.
Published: (2026)
by: Geng, Yingsheng, et al.
Published: (2026)
ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
by: He, Yefei, et al.
Published: (2024)
by: He, Yefei, et al.
Published: (2024)
Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs
by: Bui, Ngoc, et al.
Published: (2025)
by: Bui, Ngoc, et al.
Published: (2025)
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
by: Ahn, Jinwoo, et al.
Published: (2026)
by: Ahn, Jinwoo, et al.
Published: (2026)
Polynomial Context-Truncation Sensitivity in Autoregressive Language Models: Sequential Wyner-Ziv Bounds for KV Cache Compression
by: Kim, Munsik
Published: (2026)
by: Kim, Munsik
Published: (2026)
Evaluating the Quality of Randomness and Entropy in Tasks Supported by Large Language Models
by: Karanjai, Rabimba, et al.
Published: (2025)
by: Karanjai, Rabimba, et al.
Published: (2025)
ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression
by: Liu, Guangda, et al.
Published: (2024)
by: Liu, Guangda, et al.
Published: (2024)
Hurwitz Quaternion Multiplicative Quantization for KV Cache Compression
by: Swain, Kabir, et al.
Published: (2026)
by: Swain, Kabir, et al.
Published: (2026)
Palu: Compressing KV-Cache with Low-Rank Projection
by: Chang, Chi-Chih, et al.
Published: (2024)
by: Chang, Chi-Chih, et al.
Published: (2024)
PolarQuant: Quantizing KV Caches with Polar Transformation
by: Han, Insu, et al.
Published: (2025)
by: Han, Insu, et al.
Published: (2025)
KVSculpt: KV Cache Compression as Distillation
by: Jiang, Bo, et al.
Published: (2026)
by: Jiang, Bo, et al.
Published: (2026)
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
by: Kang, Hao, et al.
Published: (2024)
by: Kang, Hao, et al.
Published: (2024)
Crystal-KV: Efficient KV Cache Management for Chain-of-Thought LLMs via Answer-First Principle
by: Wang, Zihan, et al.
Published: (2026)
by: Wang, Zihan, et al.
Published: (2026)
MARché: Fast Masked Autoregressive Image Generation with Cache-Aware Attention
by: Jiang, Chaoyi, et al.
Published: (2025)
by: Jiang, Chaoyi, et al.
Published: (2025)
SQuat: Subspace-orthogonal KV Cache Quantization
by: Wang, Hao, et al.
Published: (2025)
by: Wang, Hao, et al.
Published: (2025)
Similar Items
-
Uncertainty-Aware Wildfire Smoke Density Classification from Satellite Imagery via CBAM-Augmented EfficientNet with Evidential Deep Learning
by: Chodavarapu, Ranjith
Published: (2026) -
Probabilistic Dating of Historical Manuscripts via Evidential Deep Regression on Visual Script Features
by: Chodavarapu, Ranjith
Published: (2026) -
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025) -
SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference
by: Zhao, Yi, et al.
Published: (2025) -
Online Scheduling for LLM Inference with KV Cache Constraints
by: Jaillet, Patrick, et al.
Published: (2025)