Liu, X., He, B., Liu, X., Luo, A., Zhang, H., & Chen, H. (2026). 98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router.
Chicago Style (17th ed.) CitationLiu, Xunzhuo, Bowei He, Xue Liu, Andy Luo, Haichen Zhang, and Huamin Chen. 98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the VLLM Semantic Router. 2026.
MLA (9th ed.) CitationLiu, Xunzhuo, et al. 98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the VLLM Semantic Router. 2026.
Warning: These citations may not always be 100% accurate.