Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Press, William H.
Format:	Preprint
Published:	2025
Subjects:	General Literature Computation and Language Machine Learning 68T01 I.2.1
Online Access:	https://arxiv.org/abs/2511.11572
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915619084959744
author	Press, William H.
author_facet	Press, William H.
contents	The current standard architecture of Large Language Models (LLMs) with QKV self-attention is briefly summarized, including the architecture of a typical Transformer. Scaling laws for compute (flops) and memory (parameters plus data) are given, along with their present (2025) rough cost estimates for the parameters of present LLMs of various scales, including discussion of whether DeepSeek should be viewed as a special case. Nothing here is new, but this material seems not otherwise readily available in summary form.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_11572
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	LLM Architecture, Scaling Laws, and Economics: A Quick Summary Press, William H. General Literature Computation and Language Machine Learning 68T01 I.2.1 The current standard architecture of Large Language Models (LLMs) with QKV self-attention is briefly summarized, including the architecture of a typical Transformer. Scaling laws for compute (flops) and memory (parameters plus data) are given, along with their present (2025) rough cost estimates for the parameters of present LLMs of various scales, including discussion of whether DeepSeek should be viewed as a special case. Nothing here is new, but this material seems not otherwise readily available in summary form.
title	LLM Architecture, Scaling Laws, and Economics: A Quick Summary
topic	General Literature Computation and Language Machine Learning 68T01 I.2.1
url	https://arxiv.org/abs/2511.11572

Similar Items