Saved in:
| Main Authors: | Takbir, Nazmul, Alikhani, Hamidreza, Dutt, Nikil, Jyothi, Sangeetha Abdu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.00868 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LBI: Parallel Scan Backpropagation via Latent Bounded Interfaces
by: Lee, Shaun Christopher, et al.
Published: (2026)
by: Lee, Shaun Christopher, et al.
Published: (2026)
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
by: Tang, Hanlin, et al.
Published: (2024)
by: Tang, Hanlin, et al.
Published: (2024)
CrystalBox: Future-Based Explanations for Input-Driven Deep RL Systems
by: Patel, Sagar, et al.
Published: (2023)
by: Patel, Sagar, et al.
Published: (2023)
AttentionPredictor: Temporal Patterns Matter for KV Cache Compression
by: Yang, Qingyue, et al.
Published: (2025)
by: Yang, Qingyue, et al.
Published: (2025)
AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning
by: Song, Yurun, et al.
Published: (2025)
by: Song, Yurun, et al.
Published: (2025)
Beyond KV Caching: Shared Attention for Efficient LLMs
by: Liao, Bingli, et al.
Published: (2024)
by: Liao, Bingli, et al.
Published: (2024)
Leveraging Traceroute Inconsistencies to Improve IP Geolocation
by: Ramanathan, Alagappan, et al.
Published: (2025)
by: Ramanathan, Alagappan, et al.
Published: (2025)
ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
by: He, Yefei, et al.
Published: (2024)
by: He, Yefei, et al.
Published: (2024)
Sparse Attention across Multiple-context KV Cache
by: Cao, Ziyi, et al.
Published: (2025)
by: Cao, Ziyi, et al.
Published: (2025)
Efficient Serving for Dynamic Agent Workflows with Prediction-based KV-Cache Management
by: Zheng, Haoyu, et al.
Published: (2026)
by: Zheng, Haoyu, et al.
Published: (2026)
CacheClip: Accelerating RAG with Effective KV Cache Reuse
by: Yang, Bin, et al.
Published: (2025)
by: Yang, Bin, et al.
Published: (2025)
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
by: Ye, Lu, et al.
Published: (2024)
by: Ye, Lu, et al.
Published: (2024)
ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
by: Ramachandran, Akshat, et al.
Published: (2025)
by: Ramachandran, Akshat, et al.
Published: (2025)
The Pitfalls of KV Cache Compression
by: Chen, Alex, et al.
Published: (2025)
by: Chen, Alex, et al.
Published: (2025)
In-context KV-Cache Eviction for LLMs via Attention-Gate
by: Zeng, Zihao, et al.
Published: (2024)
by: Zeng, Zihao, et al.
Published: (2024)
KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity
by: Lesens, Damien, et al.
Published: (2025)
by: Lesens, Damien, et al.
Published: (2025)
Training Transformers for KV Cache Compressibility
by: Gelberg, Yoav, et al.
Published: (2026)
by: Gelberg, Yoav, et al.
Published: (2026)
Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
by: Saxena, Utkarsh, et al.
Published: (2024)
by: Saxena, Utkarsh, et al.
Published: (2024)
Sparse Attention as a Range Searching Problem: Towards an Inference-Efficient Index for KV Cache
by: Dehghankar, Mohsen, et al.
Published: (2026)
by: Dehghankar, Mohsen, et al.
Published: (2026)
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025)
by: Liu, Guangda, et al.
Published: (2025)
KVCompose: Efficient Structured KV Cache Compression with Composite Tokens
by: Akulov, Dmitry, et al.
Published: (2025)
by: Akulov, Dmitry, et al.
Published: (2025)
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
by: Yan, Xianglong, et al.
Published: (2025)
by: Yan, Xianglong, et al.
Published: (2025)
VeriCache: Turning Lossy KV Cache into Lossless LLM Inference
by: Yao, Jiayi, et al.
Published: (2026)
by: Yao, Jiayi, et al.
Published: (2026)
LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences
by: Wu, Wenbo, et al.
Published: (2025)
by: Wu, Wenbo, et al.
Published: (2025)
Attention Is All You Need for KV Cache in Diffusion LLMs
by: Nguyen-Tri, Quan, et al.
Published: (2025)
by: Nguyen-Tri, Quan, et al.
Published: (2025)
Crystal-KV: Efficient KV Cache Management for Chain-of-Thought LLMs via Answer-First Principle
by: Wang, Zihan, et al.
Published: (2026)
by: Wang, Zihan, et al.
Published: (2026)
Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs
by: Bui, Ngoc, et al.
Published: (2025)
by: Bui, Ngoc, et al.
Published: (2025)
RelayCaching: Accelerating LLM Collaboration via Decoding KV Cache Reuse
by: Geng, Yingsheng, et al.
Published: (2026)
by: Geng, Yingsheng, et al.
Published: (2026)
LongFlow: Efficient KV Cache Compression for Reasoning Models
by: Su, Yi, et al.
Published: (2026)
by: Su, Yi, et al.
Published: (2026)
Compute Or Load KV Cache? Why Not Both?
by: Jin, Shuowei, et al.
Published: (2024)
by: Jin, Shuowei, et al.
Published: (2024)
LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference
by: Liu, Yuhan, et al.
Published: (2025)
by: Liu, Yuhan, et al.
Published: (2025)
HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization
by: Williams, Jorge L. Ruiz
Published: (2026)
by: Williams, Jorge L. Ruiz
Published: (2026)
Hierarchical Adaptive Eviction for KV Cache Management in Multimodal Language Models
by: Ma, Xindian, et al.
Published: (2026)
by: Ma, Xindian, et al.
Published: (2026)
CoopQ: Cooperative Game Inspired Layerwise Mixed Precision Quantization for LLMs
by: Zhao, Junchen, et al.
Published: (2025)
by: Zhao, Junchen, et al.
Published: (2025)
TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference
by: Dzikanyanga, Gradwell, et al.
Published: (2026)
by: Dzikanyanga, Gradwell, et al.
Published: (2026)
LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models
by: Shi, Dachuan, et al.
Published: (2025)
by: Shi, Dachuan, et al.
Published: (2025)
Transactional Attention: Semantic Sponsorship for KV-Cache Retention
by: Basu, Abhinaba
Published: (2026)
by: Basu, Abhinaba
Published: (2026)
KVSculpt: KV Cache Compression as Distillation
by: Jiang, Bo, et al.
Published: (2026)
by: Jiang, Bo, et al.
Published: (2026)
ScoutAttention: Efficient KV Cache Offloading via Layer-Ahead CPU Pre-computation for LLM Inference
by: Zhang, Qiuyang, et al.
Published: (2026)
by: Zhang, Qiuyang, et al.
Published: (2026)
ReasonCache: Accelerating Large Reasoning Model Serving through KV Cache Sharing
by: Chen, Kaiwen, et al.
Published: (2025)
by: Chen, Kaiwen, et al.
Published: (2025)
Similar Items
-
LBI: Parallel Scan Backpropagation via Latent Bounded Interfaces
by: Lee, Shaun Christopher, et al.
Published: (2026) -
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
by: Tang, Hanlin, et al.
Published: (2024) -
CrystalBox: Future-Based Explanations for Input-Driven Deep RL Systems
by: Patel, Sagar, et al.
Published: (2023) -
AttentionPredictor: Temporal Patterns Matter for KV Cache Compression
by: Yang, Qingyue, et al.
Published: (2025) -
AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning
by: Song, Yurun, et al.
Published: (2025)