Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Qi, Binchuan, Gong, Wei, Li, Li
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2503.23016
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915218624348160
author	Qi, Binchuan Gong, Wei Li, Li
author_facet	Qi, Binchuan Gong, Wei Li, Li
contents	In this paper, we adopt a probability distribution estimation perspective to explore the optimization mechanisms of supervised classification using deep neural networks. We demonstrate that, when employing the Fenchel-Young loss, despite the non-convex nature of the fitting error with respect to the model's parameters, global optimal solutions can be approximated by simultaneously minimizing both the gradient norm and the structural error. The former can be controlled through gradient descent algorithms. For the latter, we prove that it can be managed by increasing the number of parameters and ensuring parameter independence, thereby providing theoretical insights into mechanisms such as over-parameterization and random initialization. Ultimately, the paper validates the key conclusions of the proposed method through empirical results, illustrating its practical effectiveness.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_23016
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Towards Understanding the Optimization Mechanisms in Deep Learning Qi, Binchuan Gong, Wei Li, Li Machine Learning Artificial Intelligence In this paper, we adopt a probability distribution estimation perspective to explore the optimization mechanisms of supervised classification using deep neural networks. We demonstrate that, when employing the Fenchel-Young loss, despite the non-convex nature of the fitting error with respect to the model's parameters, global optimal solutions can be approximated by simultaneously minimizing both the gradient norm and the structural error. The former can be controlled through gradient descent algorithms. For the latter, we prove that it can be managed by increasing the number of parameters and ensuring parameter independence, thereby providing theoretical insights into mechanisms such as over-parameterization and random initialization. Ultimately, the paper validates the key conclusions of the proposed method through empirical results, illustrating its practical effectiveness.
title	Towards Understanding the Optimization Mechanisms in Deep Learning
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2503.23016

Similar Items