محفوظ في:
التفاصيل البيبلوغرافية
المؤلفون الرئيسيون: Uchino, Yuki, Ozaki, Katsuhisa, Imamura, Toshiyuki
التنسيق: Preprint
منشور في: 2026
الموضوعات:
الوصول للمادة أونلاين:https://arxiv.org/abs/2602.02549
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
جدول المحتويات:
  • The Ozaki-II scheme is an emulation method that leverages the Chinese Remainder Theorem to compute high-precision matrix multiplication via a sequence of low-precision matrix multiplications. In this scheme, the attainable numerical accuracy improves as the number of low-precision matrix multiplications increases. Previous numerical studies have shown that single- and double-precision matrix multiplication using the Ozaki-II scheme achieves higher throughput than that of standard BLAS routines on modern AI hardware equipped with fast INT8 matrix multiply-accumulate units with INT8 inputs and INT32 accumulation. However, the accuracy of the Ozaki-II scheme can degrade when the exponent distribution of the input matrices is wide, in which case a large number of low-precision matrix multiplications is required to obtain high-precision results. In this paper, we present a rigorous deterministic error analysis of the Ozaki-II scheme. The proposed analysis not only clarifies the accuracy behavior of the method but also enables the estimation of the number of low-precision matrix multiplications required to achieve a desired level of numerical accuracy.