Yu, H., Cui, X., Zhang, H., Wang, H., & Wang, H. (2025). Taming Latency-Memory Trade-Off in MoE-Based LLM Serving via Fine-Grained Expert Offloading.
Chicago Style (17th ed.) CitationYu, Hanfei, Xingqi Cui, Hong Zhang, Hao Wang, and Hao Wang. Taming Latency-Memory Trade-Off in MoE-Based LLM Serving via Fine-Grained Expert Offloading. 2025.
MLA (9th ed.) CitationYu, Hanfei, et al. Taming Latency-Memory Trade-Off in MoE-Based LLM Serving via Fine-Grained Expert Offloading. 2025.
Warning: These citations may not always be 100% accurate.