Saved in:
Bibliographic Details
Main Authors: Li, Johnny Jingze, George, Vivek Kurien, Silva, Gabriel A.
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.19044
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909447864975360
author Li, Johnny Jingze
George, Vivek Kurien
Silva, Gabriel A.
author_facet Li, Johnny Jingze
George, Vivek Kurien
Silva, Gabriel A.
contents Emergence in machine learning refers to the spontaneous appearance of complex behaviors or capabilities that arise from the scale and structure of training data and model architectures, despite not being explicitly programmed. We introduce a novel yet straightforward neural network initialization scheme that aims at achieving greater potential for emergence. Measuring emergence as a kind of structural nonlinearity, our method adjusts the layer-wise weight scaling factors to achieve higher emergence values. This enhancement is easy to implement, requiring no additional optimization steps for initialization compared to GradInit. We evaluate our approach across various architectures, including MLP and convolutional architectures for image recognition and transformers for machine translation. We demonstrate substantial improvements in both model accuracy and training speed, with and without batch normalization. The simplicity, theoretical innovation, and demonstrable empirical advantages of our method make it a potent enhancement to neural network initialization practices. These results suggest a promising direction for leveraging emergence to improve neural network training methodologies. Code is available at: https://github.com/johnnyjingzeli/EmergenceInit.
format Preprint
id arxiv_https___arxiv_org_abs_2407_19044
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme
Li, Johnny Jingze
George, Vivek Kurien
Silva, Gabriel A.
Machine Learning
Computer Vision and Pattern Recognition
Emergence in machine learning refers to the spontaneous appearance of complex behaviors or capabilities that arise from the scale and structure of training data and model architectures, despite not being explicitly programmed. We introduce a novel yet straightforward neural network initialization scheme that aims at achieving greater potential for emergence. Measuring emergence as a kind of structural nonlinearity, our method adjusts the layer-wise weight scaling factors to achieve higher emergence values. This enhancement is easy to implement, requiring no additional optimization steps for initialization compared to GradInit. We evaluate our approach across various architectures, including MLP and convolutional architectures for image recognition and transformers for machine translation. We demonstrate substantial improvements in both model accuracy and training speed, with and without batch normalization. The simplicity, theoretical innovation, and demonstrable empirical advantages of our method make it a potent enhancement to neural network initialization practices. These results suggest a promising direction for leveraging emergence to improve neural network training methodologies. Code is available at: https://github.com/johnnyjingzeli/EmergenceInit.
title Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme
topic Machine Learning
Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2407.19044