Salvato in:
Dettagli Bibliografici
Autore principale: Yun, Vincent-Daniel
Natura: Preprint
Pubblicazione: 2025
Soggetti:
Accesso online:https://arxiv.org/abs/2509.03677
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866915483843821568
author Yun, Vincent-Daniel
author_facet Yun, Vincent-Daniel
contents Gradient dynamics play a central role in determining the stability and generalization of deep neural networks. In this work, we provide an empirical analysis of how variance and standard deviation of gradients evolve during training, showing consistent changes across layers and at the global scale in convolutional networks. Motivated by these observations, we propose a hyperparameter-free gradient normalization method that aligns gradient scaling with their natural evolution. This approach prevents unintended amplification, stabilizes optimization, and preserves convergence guarantees. Experiments on the challenging CIFAR-100 benchmark with ResNet-20, ResNet-56, and VGG-16-BN demonstrate that our method maintains or improves test accuracy even under strong generalization. Beyond practical performance, our study highlights the importance of directly tracking gradient dynamics, aiming to bridge the gap between theoretical expectations and empirical behaviors, and to provide insights for future optimization research.
format Preprint
id arxiv_https___arxiv_org_abs_2509_03677
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Insights from Gradient Dynamics: Gradient Autoscaled Normalization
Yun, Vincent-Daniel
Machine Learning
Artificial Intelligence
Computer Vision and Pattern Recognition
Information Theory
Gradient dynamics play a central role in determining the stability and generalization of deep neural networks. In this work, we provide an empirical analysis of how variance and standard deviation of gradients evolve during training, showing consistent changes across layers and at the global scale in convolutional networks. Motivated by these observations, we propose a hyperparameter-free gradient normalization method that aligns gradient scaling with their natural evolution. This approach prevents unintended amplification, stabilizes optimization, and preserves convergence guarantees. Experiments on the challenging CIFAR-100 benchmark with ResNet-20, ResNet-56, and VGG-16-BN demonstrate that our method maintains or improves test accuracy even under strong generalization. Beyond practical performance, our study highlights the importance of directly tracking gradient dynamics, aiming to bridge the gap between theoretical expectations and empirical behaviors, and to provide insights for future optimization research.
title Insights from Gradient Dynamics: Gradient Autoscaled Normalization
topic Machine Learning
Artificial Intelligence
Computer Vision and Pattern Recognition
Information Theory
url https://arxiv.org/abs/2509.03677