Saved in:
Bibliographic Details
Main Author: Press, William H.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.11572
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915619084959744
author Press, William H.
author_facet Press, William H.
contents The current standard architecture of Large Language Models (LLMs) with QKV self-attention is briefly summarized, including the architecture of a typical Transformer. Scaling laws for compute (flops) and memory (parameters plus data) are given, along with their present (2025) rough cost estimates for the parameters of present LLMs of various scales, including discussion of whether DeepSeek should be viewed as a special case. Nothing here is new, but this material seems not otherwise readily available in summary form.
format Preprint
id arxiv_https___arxiv_org_abs_2511_11572
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle LLM Architecture, Scaling Laws, and Economics: A Quick Summary
Press, William H.
General Literature
Computation and Language
Machine Learning
68T01
I.2.1
The current standard architecture of Large Language Models (LLMs) with QKV self-attention is briefly summarized, including the architecture of a typical Transformer. Scaling laws for compute (flops) and memory (parameters plus data) are given, along with their present (2025) rough cost estimates for the parameters of present LLMs of various scales, including discussion of whether DeepSeek should be viewed as a special case. Nothing here is new, but this material seems not otherwise readily available in summary form.
title LLM Architecture, Scaling Laws, and Economics: A Quick Summary
topic General Literature
Computation and Language
Machine Learning
68T01
I.2.1
url https://arxiv.org/abs/2511.11572