:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Lianjun, An, Hongli, Yan, Weiqi, Du, Xin, Zhang, Shengchuan, Liu, Huazhong, Zhong, Yunshan
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2603.00907
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AHCQ-SAM: Toward Accurate and Hardware-Compatible Post-Training Segment Anything Model Quantization
by: Zhang, Wenlun, et al.
Published: (2025)

PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs
by: Liu, Tengxuan, et al.
Published: (2025)

UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs
by: Li, Wenhao, et al.
Published: (2024)

SABlock: Semantic-Aware KV Cache Eviction with Adaptive Compression Block Size
by: Chen, Jinhan, et al.
Published: (2025)

Test-Time Iterative Error Correction for Efficient Diffusion Models
by: Zhong, Yunshan, et al.
Published: (2025)

Keys to Robust Edits: from Theoretical Insights to Practical Advances
by: Yan, Jianhao, et al.
Published: (2024)

Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks
by: Wang, Zheng, et al.
Published: (2024)

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
by: Liu, Zirui, et al.
Published: (2024)

DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing
by: Guo, Jinyu, et al.
Published: (2026)

ForesightKV: Optimizing KV Cache Eviction for Reasoning Models by Learning Long-Term Contribution
by: Dong, Zican, et al.
Published: (2026)

Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads
by: He, Xingyang, et al.
Published: (2025)

ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
by: Zhong, Meizhi, et al.
Published: (2024)

TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization
by: Yao, Dingyu, et al.
Published: (2025)

SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging
by: Wu, Shunlong, et al.
Published: (2026)

Divide, Optimize, Merge: Fine-Grained LLM Agent Optimization at Scale
by: Liu, Jiale, et al.
Published: (2025)

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization
by: Du, Yiyang, et al.
Published: (2025)

PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference
by: Patel, Ishan, et al.
Published: (2026)

Which Heads Matter for Reasoning? RL-Guided KV Cache Compression
by: Du, Wenjie, et al.
Published: (2025)

FlowMM: Cross-Modal Information Flow Guided KV Cache Merging for Efficient Multimodal Context Inference
by: Li, Kunxi, et al.
Published: (2025)

WeightedKV: Attention Scores Weighted Key-Value Cache Merging for Large Language Models
by: Yuan, Jian, et al.
Published: (2025)

MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
by: Wan, Zhongwei, et al.
Published: (2025)

CriticalKV: Optimizing KV Cache Eviction from an Output Perturbation Perspective
by: Feng, Yuan, et al.
Published: (2025)

KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding
by: Shi, Luohe, et al.
Published: (2025)

SCOUT: Semi-supervised Camouflaged Object Detection by Utilizing Text and Adaptive Data Selection
by: Yan, Weiqi, et al.
Published: (2025)

ZSMerge: Zero-Shot KV Cache Compression for Memory-Efficient Long-Context LLMs
by: Liu, Xin, et al.
Published: (2025)

DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs
by: Zhou, Xiabin, et al.
Published: (2024)

WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference
by: Zuo, Youhui, et al.
Published: (2025)

NeuronScope: A Multi-Agent Framework for Explaining Polysemantic Neurons in Language Models
by: Liu, Weiqi, et al.
Published: (2026)

EchoKV: Efficient KV Cache Compression via Similarity-Based Reconstruction
by: Ji, Shiyu, et al.
Published: (2026)

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
by: Sun, Hanshi, et al.
Published: (2024)

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
by: Du, Yihan, et al.
Published: (2024)

ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
by: Liu, Xiang, et al.
Published: (2025)

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
by: Cai, Zefan, et al.
Published: (2024)

LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging
by: Liu, Zehua, et al.
Published: (2025)

SpindleKV: A Novel KV Cache Reduction Method Balancing Both Shallow and Deep Layers
by: Tang, Zicong, et al.
Published: (2025)

Checkpoint Merging via Bayesian Optimization in LLM Pretraining
by: Liu, Deyuan, et al.
Published: (2024)

Dynamic Fisher-weighted Model Merging via Bayesian Optimization
by: Lee, Sanwoo, et al.
Published: (2025)

EMS: Adaptive Evict-then-Merge Strategy for Head-wise KV Cache Compression Based on Global-Local Importance
by: Li, Yingxin, et al.
Published: (2024)

NestedKV: Nested Memory Routing for Long-Context KV Cache Compression
by: Chen, Hong, et al.
Published: (2026)

1bit-Merging: Dynamic Quantized Merging for Large Language Models
by: Liu, Shuqi, et al.
Published: (2025)