Saved in:
| Main Authors: | Liu, Lianjun, An, Hongli, Yan, Weiqi, Du, Xin, Zhang, Shengchuan, Liu, Huazhong, Zhong, Yunshan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.00907 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AHCQ-SAM: Toward Accurate and Hardware-Compatible Post-Training Segment Anything Model Quantization
by: Zhang, Wenlun, et al.
Published: (2025)
by: Zhang, Wenlun, et al.
Published: (2025)
PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs
by: Liu, Tengxuan, et al.
Published: (2025)
by: Liu, Tengxuan, et al.
Published: (2025)
UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs
by: Li, Wenhao, et al.
Published: (2024)
by: Li, Wenhao, et al.
Published: (2024)
SABlock: Semantic-Aware KV Cache Eviction with Adaptive Compression Block Size
by: Chen, Jinhan, et al.
Published: (2025)
by: Chen, Jinhan, et al.
Published: (2025)
Test-Time Iterative Error Correction for Efficient Diffusion Models
by: Zhong, Yunshan, et al.
Published: (2025)
by: Zhong, Yunshan, et al.
Published: (2025)
Keys to Robust Edits: from Theoretical Insights to Practical Advances
by: Yan, Jianhao, et al.
Published: (2024)
by: Yan, Jianhao, et al.
Published: (2024)
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks
by: Wang, Zheng, et al.
Published: (2024)
by: Wang, Zheng, et al.
Published: (2024)
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
by: Liu, Zirui, et al.
Published: (2024)
by: Liu, Zirui, et al.
Published: (2024)
DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing
by: Guo, Jinyu, et al.
Published: (2026)
by: Guo, Jinyu, et al.
Published: (2026)
ForesightKV: Optimizing KV Cache Eviction for Reasoning Models by Learning Long-Term Contribution
by: Dong, Zican, et al.
Published: (2026)
by: Dong, Zican, et al.
Published: (2026)
Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads
by: He, Xingyang, et al.
Published: (2025)
by: He, Xingyang, et al.
Published: (2025)
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
by: Zhong, Meizhi, et al.
Published: (2024)
by: Zhong, Meizhi, et al.
Published: (2024)
TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization
by: Yao, Dingyu, et al.
Published: (2025)
by: Yao, Dingyu, et al.
Published: (2025)
SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging
by: Wu, Shunlong, et al.
Published: (2026)
by: Wu, Shunlong, et al.
Published: (2026)
Divide, Optimize, Merge: Fine-Grained LLM Agent Optimization at Scale
by: Liu, Jiale, et al.
Published: (2025)
by: Liu, Jiale, et al.
Published: (2025)
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization
by: Du, Yiyang, et al.
Published: (2025)
by: Du, Yiyang, et al.
Published: (2025)
PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference
by: Patel, Ishan, et al.
Published: (2026)
by: Patel, Ishan, et al.
Published: (2026)
Which Heads Matter for Reasoning? RL-Guided KV Cache Compression
by: Du, Wenjie, et al.
Published: (2025)
by: Du, Wenjie, et al.
Published: (2025)
FlowMM: Cross-Modal Information Flow Guided KV Cache Merging for Efficient Multimodal Context Inference
by: Li, Kunxi, et al.
Published: (2025)
by: Li, Kunxi, et al.
Published: (2025)
WeightedKV: Attention Scores Weighted Key-Value Cache Merging for Large Language Models
by: Yuan, Jian, et al.
Published: (2025)
by: Yuan, Jian, et al.
Published: (2025)
MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
by: Wan, Zhongwei, et al.
Published: (2025)
by: Wan, Zhongwei, et al.
Published: (2025)
CriticalKV: Optimizing KV Cache Eviction from an Output Perturbation Perspective
by: Feng, Yuan, et al.
Published: (2025)
by: Feng, Yuan, et al.
Published: (2025)
KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding
by: Shi, Luohe, et al.
Published: (2025)
by: Shi, Luohe, et al.
Published: (2025)
SCOUT: Semi-supervised Camouflaged Object Detection by Utilizing Text and Adaptive Data Selection
by: Yan, Weiqi, et al.
Published: (2025)
by: Yan, Weiqi, et al.
Published: (2025)
ZSMerge: Zero-Shot KV Cache Compression for Memory-Efficient Long-Context LLMs
by: Liu, Xin, et al.
Published: (2025)
by: Liu, Xin, et al.
Published: (2025)
DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs
by: Zhou, Xiabin, et al.
Published: (2024)
by: Zhou, Xiabin, et al.
Published: (2024)
WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference
by: Zuo, Youhui, et al.
Published: (2025)
by: Zuo, Youhui, et al.
Published: (2025)
NeuronScope: A Multi-Agent Framework for Explaining Polysemantic Neurons in Language Models
by: Liu, Weiqi, et al.
Published: (2026)
by: Liu, Weiqi, et al.
Published: (2026)
EchoKV: Efficient KV Cache Compression via Similarity-Based Reconstruction
by: Ji, Shiyu, et al.
Published: (2026)
by: Ji, Shiyu, et al.
Published: (2026)
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
by: Sun, Hanshi, et al.
Published: (2024)
by: Sun, Hanshi, et al.
Published: (2024)
Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
by: Du, Yihan, et al.
Published: (2024)
by: Du, Yihan, et al.
Published: (2024)
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
by: Liu, Xiang, et al.
Published: (2025)
by: Liu, Xiang, et al.
Published: (2025)
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
by: Cai, Zefan, et al.
Published: (2024)
by: Cai, Zefan, et al.
Published: (2024)
LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging
by: Liu, Zehua, et al.
Published: (2025)
by: Liu, Zehua, et al.
Published: (2025)
SpindleKV: A Novel KV Cache Reduction Method Balancing Both Shallow and Deep Layers
by: Tang, Zicong, et al.
Published: (2025)
by: Tang, Zicong, et al.
Published: (2025)
Checkpoint Merging via Bayesian Optimization in LLM Pretraining
by: Liu, Deyuan, et al.
Published: (2024)
by: Liu, Deyuan, et al.
Published: (2024)
Dynamic Fisher-weighted Model Merging via Bayesian Optimization
by: Lee, Sanwoo, et al.
Published: (2025)
by: Lee, Sanwoo, et al.
Published: (2025)
EMS: Adaptive Evict-then-Merge Strategy for Head-wise KV Cache Compression Based on Global-Local Importance
by: Li, Yingxin, et al.
Published: (2024)
by: Li, Yingxin, et al.
Published: (2024)
NestedKV: Nested Memory Routing for Long-Context KV Cache Compression
by: Chen, Hong, et al.
Published: (2026)
by: Chen, Hong, et al.
Published: (2026)
1bit-Merging: Dynamic Quantized Merging for Large Language Models
by: Liu, Shuqi, et al.
Published: (2025)
by: Liu, Shuqi, et al.
Published: (2025)
Similar Items
-
AHCQ-SAM: Toward Accurate and Hardware-Compatible Post-Training Segment Anything Model Quantization
by: Zhang, Wenlun, et al.
Published: (2025) -
PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs
by: Liu, Tengxuan, et al.
Published: (2025) -
UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs
by: Li, Wenhao, et al.
Published: (2024) -
SABlock: Semantic-Aware KV Cache Eviction with Adaptive Compression Block Size
by: Chen, Jinhan, et al.
Published: (2025) -
Test-Time Iterative Error Correction for Efficient Diffusion Models
by: Zhong, Yunshan, et al.
Published: (2025)