:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chodavarapu, Ranjith, Xu, Lei
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.15409
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Uncertainty-Aware Wildfire Smoke Density Classification from Satellite Imagery via CBAM-Augmented EfficientNet with Evidential Deep Learning
by: Chodavarapu, Ranjith
Published: (2026)

Probabilistic Dating of Historical Manuscripts via Evidential Deep Regression on Visual Script Features
by: Chodavarapu, Ranjith
Published: (2026)

FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025)

SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference
by: Zhao, Yi, et al.
Published: (2025)

Online Scheduling for LLM Inference with KV Cache Constraints
by: Jaillet, Patrick, et al.
Published: (2025)

Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching
by: Dong, Yanhao, et al.
Published: (2025)

SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
by: Zhu, Yuxuan, et al.
Published: (2025)

KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference
by: Tian, Yuxuan, et al.
Published: (2025)

Leyline: KV Cache Directives for Agentic Inference
by: Ma, Bole, et al.
Published: (2026)

The Pitfalls of KV Cache Compression
by: Chen, Alex, et al.
Published: (2025)

The Residual Stream Is All You Need: On the Redundancy of the KV Cache in Transformer Inference
by: Qasim, Kaleem Ullah, et al.
Published: (2026)

Defeating the Training-Inference Mismatch via FP16
by: Qi, Penghui, et al.
Published: (2025)

KV Cache Transform Coding for Compact Storage in LLM Inference
by: Staniszewski, Konrad, et al.
Published: (2025)

ReasonCache: Accelerating Large Reasoning Model Serving through KV Cache Sharing
by: Chen, Kaiwen, et al.
Published: (2025)

CacheClip: Accelerating RAG with Effective KV Cache Reuse
by: Yang, Bin, et al.
Published: (2025)

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
by: Dong, Harry, et al.
Published: (2024)

MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference
by: Li, Kunxi, et al.
Published: (2025)

KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs
by: Chen, Chuangtao, et al.
Published: (2026)

CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
by: Wang, Yixuan, et al.
Published: (2025)

CoKV: Optimizing KV Cache Allocation via Cooperative Game
by: Sun, Qiheng, et al.
Published: (2025)

TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference
by: Dzikanyanga, Gradwell, et al.
Published: (2026)

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference
by: Nadali, Alireza, et al.
Published: (2026)

LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences
by: Wu, Wenbo, et al.
Published: (2025)

ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
by: Zhao, Youpeng, et al.
Published: (2024)

KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing
by: Yang, Yifei, et al.
Published: (2024)

RelayCaching: Accelerating LLM Collaboration via Decoding KV Cache Reuse
by: Geng, Yingsheng, et al.
Published: (2026)

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
by: He, Yefei, et al.
Published: (2024)

Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs
by: Bui, Ngoc, et al.
Published: (2025)

LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
by: Ahn, Jinwoo, et al.
Published: (2026)

Polynomial Context-Truncation Sensitivity in Autoregressive Language Models: Sequential Wyner-Ziv Bounds for KV Cache Compression
by: Kim, Munsik
Published: (2026)

Evaluating the Quality of Randomness and Entropy in Tasks Supported by Large Language Models
by: Karanjai, Rabimba, et al.
Published: (2025)

ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression
by: Liu, Guangda, et al.
Published: (2024)

Hurwitz Quaternion Multiplicative Quantization for KV Cache Compression
by: Swain, Kabir, et al.
Published: (2026)

Palu: Compressing KV-Cache with Low-Rank Projection
by: Chang, Chi-Chih, et al.
Published: (2024)

PolarQuant: Quantizing KV Caches with Polar Transformation
by: Han, Insu, et al.
Published: (2025)

KVSculpt: KV Cache Compression as Distillation
by: Jiang, Bo, et al.
Published: (2026)

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
by: Kang, Hao, et al.
Published: (2024)

Crystal-KV: Efficient KV Cache Management for Chain-of-Thought LLMs via Answer-First Principle
by: Wang, Zihan, et al.
Published: (2026)

MARché: Fast Masked Autoregressive Image Generation with Cache-Aware Attention
by: Jiang, Chaoyi, et al.
Published: (2025)

SQuat: Subspace-orthogonal KV Cache Quantization
by: Wang, Hao, et al.
Published: (2025)