Saved in:
Bibliographic Details
Main Authors: Yang, Yongyi, Poggio, Tomaso, Chuang, Isaac, Ziyin, Liu
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.02670
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915531772133376
author Yang, Yongyi
Poggio, Tomaso
Chuang, Isaac
Ziyin, Liu
author_facet Yang, Yongyi
Poggio, Tomaso
Chuang, Isaac
Ziyin, Liu
contents We prove that for a broad class of permutation-equivariant learning rules (including SGD, Adam, and others), the training process induces a bi-Lipschitz mapping between neurons and strongly constrains the topology of the neuron distribution during training. This result reveals a qualitative difference between small and large learning rates $η$. With a learning rate below a topological critical point $η^*$, the training is constrained to preserve all topological structure of the neurons. In contrast, above $η^*$, the learning process allows for topological simplification, making the neuron manifold progressively coarser and thereby reducing the model's expressivity. Viewed in combination with the recent discovery of the edge of stability phenomenon, the learning dynamics of neuron networks under gradient descent can be divided into two phases: first they undergo smooth optimization under topological constraints, and then enter a second phase where they learn through drastic topological simplifications. A key feature of our theory is that it is independent of specific architectures or loss functions, enabling the universal application of topological methods to the study of deep learning.
format Preprint
id arxiv_https___arxiv_org_abs_2510_02670
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Topological Invariance and Breakdown in Learning
Yang, Yongyi
Poggio, Tomaso
Chuang, Isaac
Ziyin, Liu
Machine Learning
We prove that for a broad class of permutation-equivariant learning rules (including SGD, Adam, and others), the training process induces a bi-Lipschitz mapping between neurons and strongly constrains the topology of the neuron distribution during training. This result reveals a qualitative difference between small and large learning rates $η$. With a learning rate below a topological critical point $η^*$, the training is constrained to preserve all topological structure of the neurons. In contrast, above $η^*$, the learning process allows for topological simplification, making the neuron manifold progressively coarser and thereby reducing the model's expressivity. Viewed in combination with the recent discovery of the edge of stability phenomenon, the learning dynamics of neuron networks under gradient descent can be divided into two phases: first they undergo smooth optimization under topological constraints, and then enter a second phase where they learn through drastic topological simplifications. A key feature of our theory is that it is independent of specific architectures or loss functions, enabling the universal application of topological methods to the study of deep learning.
title Topological Invariance and Breakdown in Learning
topic Machine Learning
url https://arxiv.org/abs/2510.02670