:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Yi, Zhonghua, Niu, Ge, Wang, Lei, Tang, Wei, Zhang, Liqiu
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computation and Language
Accesso online:	https://arxiv.org/abs/2411.15785
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

dKV-Cache: The Cache for Diffusion Language Models
di: Ma, Xinyin, et al.
Pubblicazione: (2025)

RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction
di: Liu, Sihao, et al.
Pubblicazione: (2026)

KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
di: Yang, Jingbo, et al.
Pubblicazione: (2025)

SpindleKV: A Novel KV Cache Reduction Method Balancing Both Shallow and Deep Layers
di: Tang, Zicong, et al.
Pubblicazione: (2025)

$A^3$: Attention-Aware Accurate KV Cache Fusion for Fast Large Language Model Serving
di: Zhou, Yuechi, et al.
Pubblicazione: (2025)

Layer-Condensed KV Cache for Efficient Inference of Large Language Models
di: Wu, Haoyi, et al.
Pubblicazione: (2024)

MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference
di: Li, Kunxi, et al.
Pubblicazione: (2025)

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
di: Liu, Akide, et al.
Pubblicazione: (2024)

AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models
di: Gu, Yifeng, et al.
Pubblicazione: (2025)

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
di: Ge, Suyu, et al.
Pubblicazione: (2023)

AnTKV: Anchor Token-Aware Sub-Bit Vector Quantization for KV Cache in Large Language Models
di: Li, Zeyu, et al.
Pubblicazione: (2025)

FlowMM: Cross-Modal Information Flow Guided KV Cache Merging for Efficient Multimodal Context Inference
di: Li, Kunxi, et al.
Pubblicazione: (2025)

CaliDrop: KV Cache Compression with Calibration
di: Su, Yi, et al.
Pubblicazione: (2025)

EntropyCache: Decoded Token Entropy Guided KV Caching for Diffusion Language Models
di: Cheong, Minsoo, et al.
Pubblicazione: (2026)

LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models
di: Shi, Dachuan, et al.
Pubblicazione: (2025)

Accurate KV Cache Quantization with Outlier Tokens Tracing
di: Su, Yi, et al.
Pubblicazione: (2025)

LongFlow: Efficient KV Cache Compression for Reasoning Models
di: Su, Yi, et al.
Pubblicazione: (2026)

QAQ: Quality Adaptive Quantization for LLM KV Cache
di: Dong, Shichen, et al.
Pubblicazione: (2024)

Crystal-KV: Efficient KV Cache Management for Chain-of-Thought LLMs via Answer-First Principle
di: Wang, Zihan, et al.
Pubblicazione: (2026)

WeightedKV: Attention Scores Weighted Key-Value Cache Merging for Large Language Models
di: Yuan, Jian, et al.
Pubblicazione: (2025)

DynaSaur: Large Language Agents Beyond Predefined Actions
di: Nguyen, Dang, et al.
Pubblicazione: (2024)

TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization
di: Yao, Dingyu, et al.
Pubblicazione: (2025)

R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
di: Cai, Zefan, et al.
Pubblicazione: (2025)

SCBench: A KV Cache-Centric Analysis of Long-Context Methods
di: Li, Yucheng, et al.
Pubblicazione: (2024)

When Hidden States Drift: Can KV Caches Rescue Long-Range Speculative Decoding?
di: Liu, Tianyu, et al.
Pubblicazione: (2026)

XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference
di: Li, Weizhuo, et al.
Pubblicazione: (2024)

AKVQ-VL: Attention-Aware KV Cache Adaptive 2-Bit Quantization for Vision-Language Models
di: Su, Zunhai, et al.
Pubblicazione: (2025)

G-KV: Decoding-Time KV Cache Eviction with Global Attention
di: Liao, Mengqi, et al.
Pubblicazione: (2025)

Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption
di: Shi, Luohe, et al.
Pubblicazione: (2024)

InnerQ: Hardware-Aware Tuning-Free Quantization of KV Cache for Large Language Models
di: Hosseini, Sayed Mohammadreza Tayaranian, et al.
Pubblicazione: (2026)

WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference
di: Zuo, Youhui, et al.
Pubblicazione: (2025)

DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs
di: Zhou, Xiabin, et al.
Pubblicazione: (2024)

Efficient Long-Context LLM Inference via KV Cache Clustering
di: Hu, Jie, et al.
Pubblicazione: (2025)

ReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal Smoothing
di: An, Yongqi, et al.
Pubblicazione: (2026)

ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
di: Liu, Xiang, et al.
Pubblicazione: (2025)

EchoKV: Efficient KV Cache Compression via Similarity-Based Reconstruction
di: Ji, Shiyu, et al.
Pubblicazione: (2026)

ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
di: Zhong, Meizhi, et al.
Pubblicazione: (2024)

GraphKV: Breaking the Static Selection Paradigm with Graph-Based KV Cache Eviction
di: Li, Xuelin, et al.
Pubblicazione: (2025)

NestedKV: Nested Memory Routing for Long-Context KV Cache Compression
di: Chen, Hong, et al.
Pubblicazione: (2026)

DeltaKV: Residual-Based KV Cache Compression via Long-Range Similarity
di: Hao, Jitai, et al.
Pubblicazione: (2026)