:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Takbir, Nazmul, Alikhani, Hamidreza, Dutt, Nikil, Jyothi, Sangeetha Abdu
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2511.00868
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LBI: Parallel Scan Backpropagation via Latent Bounded Interfaces
by: Lee, Shaun Christopher, et al.
Published: (2026)

RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
by: Tang, Hanlin, et al.
Published: (2024)

CrystalBox: Future-Based Explanations for Input-Driven Deep RL Systems
by: Patel, Sagar, et al.
Published: (2023)

AttentionPredictor: Temporal Patterns Matter for KV Cache Compression
by: Yang, Qingyue, et al.
Published: (2025)

AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning
by: Song, Yurun, et al.
Published: (2025)

Beyond KV Caching: Shared Attention for Efficient LLMs
by: Liao, Bingli, et al.
Published: (2024)

Leveraging Traceroute Inconsistencies to Improve IP Geolocation
by: Ramanathan, Alagappan, et al.
Published: (2025)

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
by: He, Yefei, et al.
Published: (2024)

Sparse Attention across Multiple-context KV Cache
by: Cao, Ziyi, et al.
Published: (2025)

Efficient Serving for Dynamic Agent Workflows with Prediction-based KV-Cache Management
by: Zheng, Haoyu, et al.
Published: (2026)

CacheClip: Accelerating RAG with Effective KV Cache Reuse
by: Yang, Bin, et al.
Published: (2025)

ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
by: Ye, Lu, et al.
Published: (2024)

ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
by: Ramachandran, Akshat, et al.
Published: (2025)

The Pitfalls of KV Cache Compression
by: Chen, Alex, et al.
Published: (2025)

In-context KV-Cache Eviction for LLMs via Attention-Gate
by: Zeng, Zihao, et al.
Published: (2024)

KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity
by: Lesens, Damien, et al.
Published: (2025)

Training Transformers for KV Cache Compressibility
by: Gelberg, Yoav, et al.
Published: (2026)

Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
by: Saxena, Utkarsh, et al.
Published: (2024)

Sparse Attention as a Range Searching Problem: Towards an Inference-Efficient Index for KV Cache
by: Dehghankar, Mohsen, et al.
Published: (2026)

FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025)

KVCompose: Efficient Structured KV Cache Compression with Composite Tokens
by: Akulov, Dmitry, et al.
Published: (2025)

ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
by: Yan, Xianglong, et al.
Published: (2025)

VeriCache: Turning Lossy KV Cache into Lossless LLM Inference
by: Yao, Jiayi, et al.
Published: (2026)

LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences
by: Wu, Wenbo, et al.
Published: (2025)

Attention Is All You Need for KV Cache in Diffusion LLMs
by: Nguyen-Tri, Quan, et al.
Published: (2025)

Crystal-KV: Efficient KV Cache Management for Chain-of-Thought LLMs via Answer-First Principle
by: Wang, Zihan, et al.
Published: (2026)

Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs
by: Bui, Ngoc, et al.
Published: (2025)

RelayCaching: Accelerating LLM Collaboration via Decoding KV Cache Reuse
by: Geng, Yingsheng, et al.
Published: (2026)

LongFlow: Efficient KV Cache Compression for Reasoning Models
by: Su, Yi, et al.
Published: (2026)

Compute Or Load KV Cache? Why Not Both?
by: Jin, Shuowei, et al.
Published: (2024)

LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference
by: Liu, Yuhan, et al.
Published: (2025)

HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization
by: Williams, Jorge L. Ruiz
Published: (2026)

Hierarchical Adaptive Eviction for KV Cache Management in Multimodal Language Models
by: Ma, Xindian, et al.
Published: (2026)

CoopQ: Cooperative Game Inspired Layerwise Mixed Precision Quantization for LLMs
by: Zhao, Junchen, et al.
Published: (2025)

TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference
by: Dzikanyanga, Gradwell, et al.
Published: (2026)

LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models
by: Shi, Dachuan, et al.
Published: (2025)

Transactional Attention: Semantic Sponsorship for KV-Cache Retention
by: Basu, Abhinaba
Published: (2026)

KVSculpt: KV Cache Compression as Distillation
by: Jiang, Bo, et al.
Published: (2026)

ScoutAttention: Efficient KV Cache Offloading via Layer-Ahead CPU Pre-computation for LLM Inference
by: Zhang, Qiuyang, et al.
Published: (2026)

ReasonCache: Accelerating Large Reasoning Model Serving through KV Cache Sharing
by: Chen, Kaiwen, et al.
Published: (2025)