MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autore principale:	Yun, Vincent-Daniel
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Machine Learning Artificial Intelligence Computer Vision and Pattern Recognition Information Theory
Accesso online:	https://arxiv.org/abs/2509.03677
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866915483843821568
author	Yun, Vincent-Daniel
author_facet	Yun, Vincent-Daniel
contents	Gradient dynamics play a central role in determining the stability and generalization of deep neural networks. In this work, we provide an empirical analysis of how variance and standard deviation of gradients evolve during training, showing consistent changes across layers and at the global scale in convolutional networks. Motivated by these observations, we propose a hyperparameter-free gradient normalization method that aligns gradient scaling with their natural evolution. This approach prevents unintended amplification, stabilizes optimization, and preserves convergence guarantees. Experiments on the challenging CIFAR-100 benchmark with ResNet-20, ResNet-56, and VGG-16-BN demonstrate that our method maintains or improves test accuracy even under strong generalization. Beyond practical performance, our study highlights the importance of directly tracking gradient dynamics, aiming to bridge the gap between theoretical expectations and empirical behaviors, and to provide insights for future optimization research.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_03677
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Insights from Gradient Dynamics: Gradient Autoscaled Normalization Yun, Vincent-Daniel Machine Learning Artificial Intelligence Computer Vision and Pattern Recognition Information Theory Gradient dynamics play a central role in determining the stability and generalization of deep neural networks. In this work, we provide an empirical analysis of how variance and standard deviation of gradients evolve during training, showing consistent changes across layers and at the global scale in convolutional networks. Motivated by these observations, we propose a hyperparameter-free gradient normalization method that aligns gradient scaling with their natural evolution. This approach prevents unintended amplification, stabilizes optimization, and preserves convergence guarantees. Experiments on the challenging CIFAR-100 benchmark with ResNet-20, ResNet-56, and VGG-16-BN demonstrate that our method maintains or improves test accuracy even under strong generalization. Beyond practical performance, our study highlights the importance of directly tracking gradient dynamics, aiming to bridge the gap between theoretical expectations and empirical behaviors, and to provide insights for future optimization research.
title	Insights from Gradient Dynamics: Gradient Autoscaled Normalization
topic	Machine Learning Artificial Intelligence Computer Vision and Pattern Recognition Information Theory
url	https://arxiv.org/abs/2509.03677

Documenti analoghi