Saved in:
Bibliographic Details
Main Authors: Nava, Andres, Wyart, Matthieu
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.23821
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910248636252160
author Nava, Andres
Wyart, Matthieu
author_facet Nava, Andres
Wyart, Matthieu
contents We propose a distributional theory of how hypernymy -- the ``is-a'' relation between general and specific concepts -- is encoded geometrically in language representations. Starting from the empirically verified assumption that words closer on the WordNet hypernym graph co-occur more often, we characterize theoretically the spectrum of the resulting embedding Gram matrix of word2vec embeddings. Under mild positivity and decay conditions on the co-occurrence kernel, we prove that the leading eigenvectors first separate broad taxonomic branches and then progressively finer sub-branches, producing a \emph{hierarchical splitting geometry} with a coarse-to-fine spectral organization that mirrors the tree. We confirm these predictions in word2vec embeddings across many sampled WordNet subtrees, and show that the same signature extends strikingly well to Gemma 2B unembeddings. Our results indicate that hierarchical concept geometry in LLMs need not reflect a hierarchy-specific functional mechanism, but emerges from the spectral structure of pairwise word statistics.
format Preprint
id arxiv_https___arxiv_org_abs_2605_23821
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence
Nava, Andres
Wyart, Matthieu
Computation and Language
Machine Learning
We propose a distributional theory of how hypernymy -- the ``is-a'' relation between general and specific concepts -- is encoded geometrically in language representations. Starting from the empirically verified assumption that words closer on the WordNet hypernym graph co-occur more often, we characterize theoretically the spectrum of the resulting embedding Gram matrix of word2vec embeddings. Under mild positivity and decay conditions on the co-occurrence kernel, we prove that the leading eigenvectors first separate broad taxonomic branches and then progressively finer sub-branches, producing a \emph{hierarchical splitting geometry} with a coarse-to-fine spectral organization that mirrors the tree. We confirm these predictions in word2vec embeddings across many sampled WordNet subtrees, and show that the same signature extends strikingly well to Gemma 2B unembeddings. Our results indicate that hierarchical concept geometry in LLMs need not reflect a hierarchy-specific functional mechanism, but emerges from the spectral structure of pairwise word statistics.
title Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence
topic Computation and Language
Machine Learning
url https://arxiv.org/abs/2605.23821