He, X., Zhang, S., Tang, K., Shi, S., Wang, Y., Zeng, Z., . . . Ong, Y. S. (2024). ExpertFlow: Efficient Mixture-of-Experts Inference via Predictive Expert Caching and Token Scheduling.
Chicago Style (17th ed.) CitationHe, Xin, et al. ExpertFlow: Efficient Mixture-of-Experts Inference via Predictive Expert Caching and Token Scheduling. 2024.
MLA (9th ed.) CitationHe, Xin, et al. ExpertFlow: Efficient Mixture-of-Experts Inference via Predictive Expert Caching and Token Scheduling. 2024.
Warning: These citations may not always be 100% accurate.