:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Zhirui, Liu, Peiyang, Shao, Ling
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2604.06746
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Joint Enhancement of Relational Reasoning for Long-Context LLMs
by: Chen, Zhirui, et al.
Published: (2025)

ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
by: Liu, Xiang, et al.
Published: (2025)

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
by: Sun, Hanshi, et al.
Published: (2024)

TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization
by: Yao, Dingyu, et al.
Published: (2025)

Efficient Long-Context LLM Inference via KV Cache Clustering
by: Hu, Jie, et al.
Published: (2025)

DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing
by: Guo, Jinyu, et al.
Published: (2026)

DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM Inference
by: Dehghanighobadi, Zahra, et al.
Published: (2026)

MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
by: Wan, Zhongwei, et al.
Published: (2025)

LycheeCluster: Efficient Long-Context Inference with Structure-Aware Chunking and Hierarchical KV Indexing
by: Li, Dongfang, et al.
Published: (2026)

NestedKV: Nested Memory Routing for Long-Context KV Cache Compression
by: Chen, Hong, et al.
Published: (2026)

Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity
by: Ma, Da, et al.
Published: (2024)

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference
by: Nadali, Alireza, et al.
Published: (2026)

Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
by: Mai, Tho, et al.
Published: (2026)

RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
by: Behnam, Payman, et al.
Published: (2025)

DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs
by: Zhou, Xiabin, et al.
Published: (2024)

XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference
by: Li, Weizhuo, et al.
Published: (2024)

TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference
by: Dzikanyanga, Gradwell, et al.
Published: (2026)

IndexMem: Learned KV-Cache Eviction with Latent Memory for Long-Context LLM Inference
by: Yang, Xintong, et al.
Published: (2026)

MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference
by: Li, Kunxi, et al.
Published: (2025)

KVDrive: A Holistic Multi-Tier KV Cache Management System for Long-Context LLM Inference
by: Lin, Jian, et al.
Published: (2026)

LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
by: Wan, Zhongwei, et al.
Published: (2024)

StructCoh: Structured Contrastive Learning for Context-Aware Text Semantic Matching
by: Xue, Chao, et al.
Published: (2025)

OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference
by: Gu, Yuzhe, et al.
Published: (2025)

DynSplit-KV: Dynamic Semantic Splitting for KVCache Compression in Efficient Long-Context LLM Inference
by: Ye, Jiancai, et al.
Published: (2026)

ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs
by: Qi, Yanlin, et al.
Published: (2026)

StructMem: Structured Memory for Long-Horizon Behavior in LLMs
by: Xu, Buqiang, et al.
Published: (2026)

Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories
by: Liu, Peiyang, et al.
Published: (2026)

StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
by: Li, Zhuoqun, et al.
Published: (2024)

LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference
by: Wang, Guangtao, et al.
Published: (2025)

HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference
by: Shi, Zhiyuan, et al.
Published: (2026)

Exploring Fine-Tuning for In-Context Retrieval and Efficient KV-Caching in Long-Context Language Models
by: Molfese, Francesco Maria, et al.
Published: (2026)

ZSMerge: Zero-Shot KV Cache Compression for Memory-Efficient Long-Context LLMs
by: Liu, Xin, et al.
Published: (2025)

CTkvr: KV Cache Retrieval for Long-Context LLMs via Centroid then Token Indexing
by: Lu, Kuan, et al.
Published: (2025)

WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference
by: Zuo, Youhui, et al.
Published: (2025)

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
by: Zhuang, Alex, et al.
Published: (2024)

StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs
by: Chen, Hailin, et al.
Published: (2024)

SCBench: A KV Cache-Centric Analysis of Long-Context Methods
by: Li, Yucheng, et al.
Published: (2024)

TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
by: Wu, Wei, et al.
Published: (2024)

SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation
by: Qiao, Aurick, et al.
Published: (2024)

Membership Inference Attack against Long-Context Large Language Models
by: Wang, Zixiong, et al.
Published: (2024)