Song, M., Tang, X., Hou, F., Li, J., Wei, W., Ma, Y., . . . Long, G. (2024). Tackling the Dynamicity in a Production LLM Serving System with SOTA Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient Meta-kernels.
Chicago Style (17th ed.) CitationSong, Mingcong, et al. Tackling the Dynamicity in a Production LLM Serving System with SOTA Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient Meta-kernels. 2024.
MLA (9th ed.) CitationSong, Mingcong, et al. Tackling the Dynamicity in a Production LLM Serving System with SOTA Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient Meta-kernels. 2024.
Warning: These citations may not always be 100% accurate.