Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Xu, Yongzhong
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.18649
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908846144880640
author	Xu, Yongzhong
author_facet	Xu, Yongzhong
contents	Grokking -- the abrupt transition from memorization to generalization after extended training -- has been linked to the emergence of low-dimensional structure in learning dynamics. Yet neural network parameters inhabit extremely high-dimensional spaces. How can a low-dimensional learning process produce solutions that resist low-dimensional compression? We investigate this question in multi-task modular arithmetic, training shared-trunk Transformers with separate heads for addition, multiplication, and a quadratic operation modulo 97. Across three model scales (315K--2.2M parameters) and five weight decay settings, we compare three reconstruction methods: per-matrix SVD, joint cross-matrix SVD, and trajectory PCA. Across all conditions, grokking trajectories are confined to a 2--6 dimensional global subspace, while individual weight matrices remain effectively full-rank. Reconstruction from 3--5 trajectory PCs recovers over 95\% of final accuracy, whereas both per-matrix and joint SVD fail at sub-full rank. Even when static decompositions capture most spectral energy, they destroy task-relevant structure. These results show that learned algorithms are encoded through dynamically coordinated updates spanning all matrices, rather than localized low-rank components. We term this the holographic encoding principle: grokked solutions are globally low-rank in the space of learning directions but locally full-rank in parameter space, with implications for compression, interpretability, and understanding how neural networks encode computation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_18649
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Global Low-Rank, Local Full-Rank: The Holographic Encoding of Learned Algorithms Xu, Yongzhong Machine Learning Artificial Intelligence Grokking -- the abrupt transition from memorization to generalization after extended training -- has been linked to the emergence of low-dimensional structure in learning dynamics. Yet neural network parameters inhabit extremely high-dimensional spaces. How can a low-dimensional learning process produce solutions that resist low-dimensional compression? We investigate this question in multi-task modular arithmetic, training shared-trunk Transformers with separate heads for addition, multiplication, and a quadratic operation modulo 97. Across three model scales (315K--2.2M parameters) and five weight decay settings, we compare three reconstruction methods: per-matrix SVD, joint cross-matrix SVD, and trajectory PCA. Across all conditions, grokking trajectories are confined to a 2--6 dimensional global subspace, while individual weight matrices remain effectively full-rank. Reconstruction from 3--5 trajectory PCs recovers over 95\% of final accuracy, whereas both per-matrix and joint SVD fail at sub-full rank. Even when static decompositions capture most spectral energy, they destroy task-relevant structure. These results show that learned algorithms are encoded through dynamically coordinated updates spanning all matrices, rather than localized low-rank components. We term this the holographic encoding principle: grokked solutions are globally low-rank in the space of learning directions but locally full-rank in parameter space, with implications for compression, interpretability, and understanding how neural networks encode computation.
title	Global Low-Rank, Local Full-Rank: The Holographic Encoding of Learned Algorithms
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2602.18649

Similar Items