Saved in:
| Main Authors: | Chen, Zhirui, Liu, Peiyang, Shao, Ling |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.06746 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Joint Enhancement of Relational Reasoning for Long-Context LLMs
by: Chen, Zhirui, et al.
Published: (2025)
by: Chen, Zhirui, et al.
Published: (2025)
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
by: Liu, Xiang, et al.
Published: (2025)
by: Liu, Xiang, et al.
Published: (2025)
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
by: Sun, Hanshi, et al.
Published: (2024)
by: Sun, Hanshi, et al.
Published: (2024)
TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization
by: Yao, Dingyu, et al.
Published: (2025)
by: Yao, Dingyu, et al.
Published: (2025)
Efficient Long-Context LLM Inference via KV Cache Clustering
by: Hu, Jie, et al.
Published: (2025)
by: Hu, Jie, et al.
Published: (2025)
DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing
by: Guo, Jinyu, et al.
Published: (2026)
by: Guo, Jinyu, et al.
Published: (2026)
DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM Inference
by: Dehghanighobadi, Zahra, et al.
Published: (2026)
by: Dehghanighobadi, Zahra, et al.
Published: (2026)
MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
by: Wan, Zhongwei, et al.
Published: (2025)
by: Wan, Zhongwei, et al.
Published: (2025)
LycheeCluster: Efficient Long-Context Inference with Structure-Aware Chunking and Hierarchical KV Indexing
by: Li, Dongfang, et al.
Published: (2026)
by: Li, Dongfang, et al.
Published: (2026)
NestedKV: Nested Memory Routing for Long-Context KV Cache Compression
by: Chen, Hong, et al.
Published: (2026)
by: Chen, Hong, et al.
Published: (2026)
Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity
by: Ma, Da, et al.
Published: (2024)
by: Ma, Da, et al.
Published: (2024)
KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference
by: Nadali, Alireza, et al.
Published: (2026)
by: Nadali, Alireza, et al.
Published: (2026)
Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
by: Mai, Tho, et al.
Published: (2026)
by: Mai, Tho, et al.
Published: (2026)
RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
by: Behnam, Payman, et al.
Published: (2025)
by: Behnam, Payman, et al.
Published: (2025)
DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs
by: Zhou, Xiabin, et al.
Published: (2024)
by: Zhou, Xiabin, et al.
Published: (2024)
XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference
by: Li, Weizhuo, et al.
Published: (2024)
by: Li, Weizhuo, et al.
Published: (2024)
TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference
by: Dzikanyanga, Gradwell, et al.
Published: (2026)
by: Dzikanyanga, Gradwell, et al.
Published: (2026)
IndexMem: Learned KV-Cache Eviction with Latent Memory for Long-Context LLM Inference
by: Yang, Xintong, et al.
Published: (2026)
by: Yang, Xintong, et al.
Published: (2026)
MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference
by: Li, Kunxi, et al.
Published: (2025)
by: Li, Kunxi, et al.
Published: (2025)
KVDrive: A Holistic Multi-Tier KV Cache Management System for Long-Context LLM Inference
by: Lin, Jian, et al.
Published: (2026)
by: Lin, Jian, et al.
Published: (2026)
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
by: Wan, Zhongwei, et al.
Published: (2024)
by: Wan, Zhongwei, et al.
Published: (2024)
StructCoh: Structured Contrastive Learning for Context-Aware Text Semantic Matching
by: Xue, Chao, et al.
Published: (2025)
by: Xue, Chao, et al.
Published: (2025)
OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference
by: Gu, Yuzhe, et al.
Published: (2025)
by: Gu, Yuzhe, et al.
Published: (2025)
DynSplit-KV: Dynamic Semantic Splitting for KVCache Compression in Efficient Long-Context LLM Inference
by: Ye, Jiancai, et al.
Published: (2026)
by: Ye, Jiancai, et al.
Published: (2026)
ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs
by: Qi, Yanlin, et al.
Published: (2026)
by: Qi, Yanlin, et al.
Published: (2026)
StructMem: Structured Memory for Long-Horizon Behavior in LLMs
by: Xu, Buqiang, et al.
Published: (2026)
by: Xu, Buqiang, et al.
Published: (2026)
Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories
by: Liu, Peiyang, et al.
Published: (2026)
by: Liu, Peiyang, et al.
Published: (2026)
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
by: Li, Zhuoqun, et al.
Published: (2024)
by: Li, Zhuoqun, et al.
Published: (2024)
LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference
by: Wang, Guangtao, et al.
Published: (2025)
by: Wang, Guangtao, et al.
Published: (2025)
HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference
by: Shi, Zhiyuan, et al.
Published: (2026)
by: Shi, Zhiyuan, et al.
Published: (2026)
Exploring Fine-Tuning for In-Context Retrieval and Efficient KV-Caching in Long-Context Language Models
by: Molfese, Francesco Maria, et al.
Published: (2026)
by: Molfese, Francesco Maria, et al.
Published: (2026)
ZSMerge: Zero-Shot KV Cache Compression for Memory-Efficient Long-Context LLMs
by: Liu, Xin, et al.
Published: (2025)
by: Liu, Xin, et al.
Published: (2025)
CTkvr: KV Cache Retrieval for Long-Context LLMs via Centroid then Token Indexing
by: Lu, Kuan, et al.
Published: (2025)
by: Lu, Kuan, et al.
Published: (2025)
WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference
by: Zuo, Youhui, et al.
Published: (2025)
by: Zuo, Youhui, et al.
Published: (2025)
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
by: Zhuang, Alex, et al.
Published: (2024)
by: Zhuang, Alex, et al.
Published: (2024)
StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs
by: Chen, Hailin, et al.
Published: (2024)
by: Chen, Hailin, et al.
Published: (2024)
SCBench: A KV Cache-Centric Analysis of Long-Context Methods
by: Li, Yucheng, et al.
Published: (2024)
by: Li, Yucheng, et al.
Published: (2024)
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
by: Wu, Wei, et al.
Published: (2024)
by: Wu, Wei, et al.
Published: (2024)
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation
by: Qiao, Aurick, et al.
Published: (2024)
by: Qiao, Aurick, et al.
Published: (2024)
Membership Inference Attack against Long-Context Large Language Models
by: Wang, Zixiong, et al.
Published: (2024)
by: Wang, Zixiong, et al.
Published: (2024)
Similar Items
-
Joint Enhancement of Relational Reasoning for Long-Context LLMs
by: Chen, Zhirui, et al.
Published: (2025) -
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
by: Liu, Xiang, et al.
Published: (2025) -
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
by: Sun, Hanshi, et al.
Published: (2024) -
TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization
by: Yao, Dingyu, et al.
Published: (2025) -
Efficient Long-Context LLM Inference via KV Cache Clustering
by: Hu, Jie, et al.
Published: (2025)