Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Chuning, Maddison, Chris J.
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2605.09154
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910205769416704
author	Li, Chuning Maddison, Chris J.
author_facet	Li, Chuning Maddison, Chris J.
contents	We introduce a predictive model that estimates the pre-training loss of large models from model size (N), batch size (B) and number of weight updates (K). This is the first loss prediction model that can handle changing batch size. The model outperforms Chinchilla's loss model, a model of the test loss using the batch size and number of tokens, in terms of projecting the loss at extrapolated compute budgets (up to 1000 folds). A natural use of the model is to find optimal N, B, K configurations under explicit and compound resource constraints like time, memory and compute. In our experiments, the model-selected configurations are close to ground-truth optimal. Our work advocates for loss prediction as a better alternative to heuristic-based laws, which are growing in complexity. The implementation is available on https://github.com/chuningxdy/Noisy-Quadratic-System.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_09154
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Predicting Large Model Test Losses with a Noisy Quadratic System Li, Chuning Maddison, Chris J. Machine Learning We introduce a predictive model that estimates the pre-training loss of large models from model size (N), batch size (B) and number of weight updates (K). This is the first loss prediction model that can handle changing batch size. The model outperforms Chinchilla's loss model, a model of the test loss using the batch size and number of tokens, in terms of projecting the loss at extrapolated compute budgets (up to 1000 folds). A natural use of the model is to find optimal N, B, K configurations under explicit and compound resource constraints like time, memory and compute. In our experiments, the model-selected configurations are close to ground-truth optimal. Our work advocates for loss prediction as a better alternative to heuristic-based laws, which are growing in complexity. The implementation is available on https://github.com/chuningxdy/Noisy-Quadratic-System.
title	Predicting Large Model Test Losses with a Noisy Quadratic System
topic	Machine Learning
url	https://arxiv.org/abs/2605.09154

Similar Items