Raposo, D., Ritter, S., Richards, B., Lillicrap, T., Humphreys, P. C., & Santoro, A. (2024). Mixture-of-Depths: Dynamically allocating compute in transformer-based language models.
Chicago Style (17th ed.) CitationRaposo, David, Sam Ritter, Blake Richards, Timothy Lillicrap, Peter Conway Humphreys, and Adam Santoro. Mixture-of-Depths: Dynamically Allocating Compute in Transformer-based Language Models. 2024.
MLA (9th ed.) CitationRaposo, David, et al. Mixture-of-Depths: Dynamically Allocating Compute in Transformer-based Language Models. 2024.
Warning: These citations may not always be 100% accurate.