MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Pan, Yunjie, Yang, Yongyi, Yang, Hanmei, Mahlke, Scott
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Machine Learning Hardware Architecture
Accesso online:	https://arxiv.org/abs/2602.01410
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866912867019653120
author	Pan, Yunjie Yang, Yongyi Yang, Hanmei Mahlke, Scott
author_facet	Pan, Yunjie Yang, Yongyi Yang, Hanmei Mahlke, Scott
contents	Training large language models (LLMs) efficiently while preserving model quality poses significant challenges, particularly with subbyte precision supported by state-of-the-art GPUs. Current mixed-precision training approaches either apply uniform precision to all GEMM operations or rely on heuristic-based methods that fail to generalize during training, leading to suboptimal convergence and instability. To address these challenges, this paper introduces SNIP, a fine-grained adaptive mixed-precision training framework for LLM pretraining that supports subbyte precision. SNIP periodically collects statistics on activations, gradients, and optimizer states to assess the precision loss impact on model quality. We define two key metrics: loss divergence in the forward pass, caused by quantization-induced increases in training loss, and weight divergence in the backward pass, which measures error propagation through gradients affecting model updates. These metrics guide an Integer Linear Programming (ILP) problem that systematically optimizes layerwise precision to minimize overall quality loss while meeting efficiency targets. Experiments on 1B, 3B, 7B and 70B Llama-like models demonstrate that SNIP consistently outperforms existing baselines, reducing FLOPs by up to 80% while preserving model quality across different model sizes and training phases with minimal computational overhead.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_01410
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	SNIP: An Adaptive Mixed Precision Framework for Subbyte Large Language Model Training Pan, Yunjie Yang, Yongyi Yang, Hanmei Mahlke, Scott Machine Learning Hardware Architecture Training large language models (LLMs) efficiently while preserving model quality poses significant challenges, particularly with subbyte precision supported by state-of-the-art GPUs. Current mixed-precision training approaches either apply uniform precision to all GEMM operations or rely on heuristic-based methods that fail to generalize during training, leading to suboptimal convergence and instability. To address these challenges, this paper introduces SNIP, a fine-grained adaptive mixed-precision training framework for LLM pretraining that supports subbyte precision. SNIP periodically collects statistics on activations, gradients, and optimizer states to assess the precision loss impact on model quality. We define two key metrics: loss divergence in the forward pass, caused by quantization-induced increases in training loss, and weight divergence in the backward pass, which measures error propagation through gradients affecting model updates. These metrics guide an Integer Linear Programming (ILP) problem that systematically optimizes layerwise precision to minimize overall quality loss while meeting efficiency targets. Experiments on 1B, 3B, 7B and 70B Llama-like models demonstrate that SNIP consistently outperforms existing baselines, reducing FLOPs by up to 80% while preserving model quality across different model sizes and training phases with minimal computational overhead.
title	SNIP: An Adaptive Mixed Precision Framework for Subbyte Large Language Model Training
topic	Machine Learning Hardware Architecture
url	https://arxiv.org/abs/2602.01410

Documenti analoghi