Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Cacioli, Jon-Paul
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Quantitative Methods
Online Access:	https://arxiv.org/abs/2604.04469
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917385790816256
author	Cacioli, Jon-Paul
author_facet	Cacioli, Jon-Paul
contents	Scalar variability -- the finding that representational noise scales proportionally with magnitude, producing a constant coefficient of variation -- is a hallmark of biological magnitude systems. We tested whether transformer language models exhibit this property by analysing the dispersion of hidden-state representations across carrier sentences for 26 numerical magnitudes in three 7-8B parameter models (Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.3, Llama-3-8B-Base; data from Cacioli, 2026). We found the opposite: representational variability decreased with magnitude along the magnitude axis (scaling exponent alpha approx -0.19; 0/16 primary layers with alpha > 0, all three models). The negative sign was consistent in full-dimensional space (alpha approx -0.04) and after sentence-identity correction (alpha approx -0.007). The anti-scalar pattern was 3-5x stronger along the magnitude axis than orthogonal dimensions, and corpus frequency strongly predicted per-magnitude variability (rho = .84). These results demonstrate that distributional learning alone is insufficient to produce scalar variability: transformers reproduce log-compressive magnitude geometry but not the constant-CV noise signature observed in biological systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_04469
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Same Geometry, Opposite Noise: Transformer Magnitude Representations Lack Scalar Variability Cacioli, Jon-Paul Computation and Language Quantitative Methods Scalar variability -- the finding that representational noise scales proportionally with magnitude, producing a constant coefficient of variation -- is a hallmark of biological magnitude systems. We tested whether transformer language models exhibit this property by analysing the dispersion of hidden-state representations across carrier sentences for 26 numerical magnitudes in three 7-8B parameter models (Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.3, Llama-3-8B-Base; data from Cacioli, 2026). We found the opposite: representational variability decreased with magnitude along the magnitude axis (scaling exponent alpha approx -0.19; 0/16 primary layers with alpha > 0, all three models). The negative sign was consistent in full-dimensional space (alpha approx -0.04) and after sentence-identity correction (alpha approx -0.007). The anti-scalar pattern was 3-5x stronger along the magnitude axis than orthogonal dimensions, and corpus frequency strongly predicted per-magnitude variability (rho = .84). These results demonstrate that distributional learning alone is insufficient to produce scalar variability: transformers reproduce log-compressive magnitude geometry but not the constant-CV noise signature observed in biological systems.
title	Same Geometry, Opposite Noise: Transformer Magnitude Representations Lack Scalar Variability
topic	Computation and Language Quantitative Methods
url	https://arxiv.org/abs/2604.04469

Similar Items