Zhang, Q., Zhou, K., Tang, D., Lu, K., Li, C., Yang, Z., . . . Wan, J. (2026). ScoutAttention: Efficient KV Cache Offloading via Layer-Ahead CPU Pre-computation for LLM Inference.
Chicago Style (17th ed.) CitationZhang, Qiuyang, Kai Zhou, Ding Tang, Kai Lu, Cheng Li, Zhenyu Yang, Peng Xu, and Jiguang Wan. ScoutAttention: Efficient KV Cache Offloading via Layer-Ahead CPU Pre-computation for LLM Inference. 2026.
MLA (9th ed.) CitationZhang, Qiuyang, et al. ScoutAttention: Efficient KV Cache Offloading via Layer-Ahead CPU Pre-computation for LLM Inference. 2026.
Warning: These citations may not always be 100% accurate.