Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhao, Yize, Thrampoulidis, Christos
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2505.08348
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915537630527488
author	Zhao, Yize Thrampoulidis, Christos
author_facet	Zhao, Yize Thrampoulidis, Christos
contents	We investigate how next-token prediction (NTP) optimization leads language models to extract and organize semantic structure from text. Our analysis, based on a tractable mathematical model and controlled synthetic data, reveals that NTP implicitly guides models to factor a centered support matrix encoding context-to-next-token co-occurrence patterns via singular value decomposition (SVD). While models never explicitly construct this matrix, learned word and context embeddings converge to its SVD factors, with singular vectors encoding latent semantic concepts through their sign patterns. We demonstrate that concepts corresponding to larger singular values are learned earlier during training, yielding a natural semantic hierarchy where broad categories emerge before fine-grained ones. This insight motivates orthant-based clustering, a method that combines concept signs to identify interpretable semantic categories. We validate our findings on synthetic datasets and pretrained language models, recovering diverse semantic structures such as grammatical categories, named entity types, and topical distinctions (medical, entertainment). Our work bridges classical distributional semantics and neural collapse geometry, characterizing how gradient-based optimization implicitly determines both the matrix representation and factorization method that encode semantic structure.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_08348
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Geometry of Semantics in Next-Token Prediction: How Optimization Implicitly Organizes Linguistic Representations Zhao, Yize Thrampoulidis, Christos Computation and Language We investigate how next-token prediction (NTP) optimization leads language models to extract and organize semantic structure from text. Our analysis, based on a tractable mathematical model and controlled synthetic data, reveals that NTP implicitly guides models to factor a centered support matrix encoding context-to-next-token co-occurrence patterns via singular value decomposition (SVD). While models never explicitly construct this matrix, learned word and context embeddings converge to its SVD factors, with singular vectors encoding latent semantic concepts through their sign patterns. We demonstrate that concepts corresponding to larger singular values are learned earlier during training, yielding a natural semantic hierarchy where broad categories emerge before fine-grained ones. This insight motivates orthant-based clustering, a method that combines concept signs to identify interpretable semantic categories. We validate our findings on synthetic datasets and pretrained language models, recovering diverse semantic structures such as grammatical categories, named entity types, and topical distinctions (medical, entertainment). Our work bridges classical distributional semantics and neural collapse geometry, characterizing how gradient-based optimization implicitly determines both the matrix representation and factorization method that encode semantic structure.
title	Geometry of Semantics in Next-Token Prediction: How Optimization Implicitly Organizes Linguistic Representations
topic	Computation and Language
url	https://arxiv.org/abs/2505.08348

Similar Items