Wang, J., Wang, M., Zhou, Z., Yan, J., E, W., & Wu, L. (2025). The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training.
Chicago-referens (17:e uppl.)Wang, Jinbo, Mingze Wang, Zhanpeng Zhou, Junchi Yan, Weinan E, och Lei Wu. The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training. 2025.
MLA-referens (9:e uppl.)Wang, Jinbo, et al. The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training. 2025.
Varning: dessa hänvisningar är inte alltid fullständigt riktiga.