Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Yu, Tao, Gupta, Gaurav, Gopalswamy, Karthick, Mamidala, Amith, Zhou, Hao, Huynh, Jeffrey, Park, Youngsuk, Diamant, Ron, Deoras, Anoop, Huan, Luke
Formato:	Preprint
Publicado:	2024
Materias:	Machine Learning
Acceso en línea:	https://arxiv.org/abs/2405.03637
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866913342842470400
author	Yu, Tao Gupta, Gaurav Gopalswamy, Karthick Mamidala, Amith Zhou, Hao Huynh, Jeffrey Park, Youngsuk Diamant, Ron Deoras, Anoop Huan, Luke
author_facet	Yu, Tao Gupta, Gaurav Gopalswamy, Karthick Mamidala, Amith Zhou, Hao Huynh, Jeffrey Park, Youngsuk Diamant, Ron Deoras, Anoop Huan, Luke
contents	Large models training is plagued by the intense compute cost and limited hardware memory. A practical solution is low-precision representation but is troubled by loss in numerical accuracy and unstable training rendering the model less useful. We argue that low-precision floating points can perform well provided the error is properly compensated at the critical locations in the training process. We propose Collage which utilizes multi-component float representation in low-precision to accurately perform operations with numerical errors accounted. To understand the impact of imprecision to training, we propose a simple and novel metric which tracks the lost information during training as well as differentiates various precision strategies. Our method works with commonly used low-precision such as half-precision ($16$-bit floating points) and can be naturally extended to work with even lower precision such as $8$-bit. Experimental results show that pre-training using Collage removes the requirement of using $32$-bit floating-point copies of the model and attains similar/better training performance compared to $(16, 32)$-bit mixed-precision strategy, with up to $3.7\times$ speedup and $\sim 15\%$ to $23\%$ less memory usage in practice.
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_03637
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Collage: Light-Weight Low-Precision Strategy for LLM Training Yu, Tao Gupta, Gaurav Gopalswamy, Karthick Mamidala, Amith Zhou, Hao Huynh, Jeffrey Park, Youngsuk Diamant, Ron Deoras, Anoop Huan, Luke Machine Learning Large models training is plagued by the intense compute cost and limited hardware memory. A practical solution is low-precision representation but is troubled by loss in numerical accuracy and unstable training rendering the model less useful. We argue that low-precision floating points can perform well provided the error is properly compensated at the critical locations in the training process. We propose Collage which utilizes multi-component float representation in low-precision to accurately perform operations with numerical errors accounted. To understand the impact of imprecision to training, we propose a simple and novel metric which tracks the lost information during training as well as differentiates various precision strategies. Our method works with commonly used low-precision such as half-precision ($16$-bit floating points) and can be naturally extended to work with even lower precision such as $8$-bit. Experimental results show that pre-training using Collage removes the requirement of using $32$-bit floating-point copies of the model and attains similar/better training performance compared to $(16, 32)$-bit mixed-precision strategy, with up to $3.7\times$ speedup and $\sim 15\%$ to $23\%$ less memory usage in practice.
title	Collage: Light-Weight Low-Precision Strategy for LLM Training
topic	Machine Learning
url	https://arxiv.org/abs/2405.03637

Ejemplares similares