Record Citations :: Library Catalog

APA (7th ed.) Citation

Liu, X., He, B., Liu, X., Luo, A., Zhang, H., & Chen, H. (2026). 98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router.

Chicago Style (17th ed.) Citation

Liu, Xunzhuo, Bowei He, Xue Liu, Andy Luo, Haichen Zhang, and Huamin Chen. 98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the VLLM Semantic Router. 2026.

MLA (9th ed.) Citation

Liu, Xunzhuo, et al. 98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the VLLM Semantic Router. 2026.

Warning: These citations may not always be 100% accurate.