Wang, Z., Cui, B., & Gan, S. (2024). SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget.
Chicago Style (17th ed.) CitationWang, Zihao, Bin Cui, and Shaoduo Gan. SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget. 2024.
MLA (9th ed.) CitationWang, Zihao, et al. SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget. 2024.
Warning: These citations may not always be 100% accurate.