:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mao, Weian, Lin, Xi, Huang, Wei, Xie, Yuxin, Fu, Tianfu, Zhuang, Bohan, Han, Song, Chen, Yukang
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.04921
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation
by: Chen, Yukang, et al.
Published: (2026)

EvolKV: Evolutionary KV Cache Compression for LLM Inference
by: Yu, Bohan, et al.
Published: (2025)

FlashBlock: Attention Caching for Efficient Long-Context Block Diffusion
by: Chen, Zhuokun, et al.
Published: (2026)

RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
by: Tang, Hanlin, et al.
Published: (2024)

LongFlow: Efficient KV Cache Compression for Reasoning Models
by: Su, Yi, et al.
Published: (2026)

Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models
by: Ji, Yicheng, et al.
Published: (2026)

LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning
by: Zhang, Haoyue, et al.
Published: (2025)

PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation
by: Li, Xiaolong, et al.
Published: (2025)

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
by: Liu, Akide, et al.
Published: (2024)

KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head
by: Rehg, Isaac
Published: (2024)

TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
by: Wu, Wei, et al.
Published: (2024)

HDGlyph: A Hierarchical Disentangled Glyph-Based Framework for Long-Tail Text Rendering in Diffusion Models
by: Zhuang, Shuhan, et al.
Published: (2025)

NestedKV: Nested Memory Routing for Long-Context KV Cache Compression
by: Chen, Hong, et al.
Published: (2026)

LongVLM: Efficient Long Video Understanding via Large Language Models
by: Weng, Yuetian, et al.
Published: (2024)

ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
by: Liu, Xiang, et al.
Published: (2025)

WeGeFT: Weight-Generative Fine-Tuning for Multi-Faceted Efficient Adaptation of Large Models
by: Savadikar, Chinmay, et al.
Published: (2023)

SurfaceLogicKV: Surface and Logic Attention Behaviors are All You Need for Robust KV Cache Compression
by: Li, Mengjie, et al.
Published: (2025)

GRKV: Global Regression for Training-Free KV Cache Compression in Long-Context LLMs
by: Peng, Junjie, et al.
Published: (2026)

Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity
by: Ma, Da, et al.
Published: (2024)

DeltaKV: Residual-Based KV Cache Compression via Long-Range Similarity
by: Hao, Jitai, et al.
Published: (2026)

RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
by: Behnam, Payman, et al.
Published: (2025)

EliteKV: Scalable KV Cache Compression via RoPE Frequency Selection and Joint Low-Rank Projection
by: Zhou, Yuhao, et al.
Published: (2025)

Lossless KV Cache Compression to 2%
by: Yang, Zhen, et al.
Published: (2024)

AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference
by: Huang, Kai, et al.
Published: (2025)

R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
by: Cai, Zefan, et al.
Published: (2025)

G-KV: Decoding-Time KV Cache Eviction with Global Attention
by: Liao, Mengqi, et al.
Published: (2025)

DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs
by: Zhou, Xiabin, et al.
Published: (2024)

X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
by: Li, Guihong, et al.
Published: (2025)

EchoKV: Efficient KV Cache Compression via Similarity-Based Reconstruction
by: Ji, Shiyu, et al.
Published: (2026)

ForesightKV: Optimizing KV Cache Eviction for Reasoning Models by Learning Long-Term Contribution
by: Dong, Zican, et al.
Published: (2026)

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
by: Chen, Yukang, et al.
Published: (2023)

Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference
by: Fei, Weizhi, et al.
Published: (2025)

Head-Aware KV Cache Compression for Efficient Visual Autoregressive Modeling
by: Qin, Ziran, et al.
Published: (2025)

Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
by: Saxena, Utkarsh, et al.
Published: (2024)

Hold Onto That Thought: Assessing KV Cache Compression On Reasoning
by: Liu, Minghui, et al.
Published: (2025)

ZSMerge: Zero-Shot KV Cache Compression for Memory-Efficient Long-Context LLMs
by: Liu, Xin, et al.
Published: (2025)

AttentionPredictor: Temporal Patterns Matter for KV Cache Compression
by: Yang, Qingyue, et al.
Published: (2025)

GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting
by: Cao, Yukang, et al.
Published: (2024)

Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning
by: Song, Jiwon, et al.
Published: (2025)

Expected Attention: KV Cache Compression by Estimating Attention from Future Queries Distribution
by: Devoto, Alessio, et al.
Published: (2025)