Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Huang, Allen Hao
Format:	Preprint
Published:	2024
Subjects:	Neural and Evolutionary Computing Machine Learning
Online Access:	https://arxiv.org/abs/2405.20768
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909213980098560
author	Huang, Allen Hao
author_facet	Huang, Allen Hao
contents	Activation functions are core components of all deep learning architectures. Currently, the most popular activation functions are smooth ReLU variants like GELU and SiLU. These are self-gated activation functions where the range of the gating function is between zero and one. In this paper, we explore the viability of using arctan as a gating mechanism. A self-gated activation function that uses arctan as its gating function has a monotonically increasing first derivative. To make this activation function competitive, it is necessary to introduce a trainable parameter for every MLP block to expand the range of the gating function beyond zero and one. We find that this technique also improves existing self-gated activation functions. We conduct an empirical evaluation of Expanded ArcTan Linear Unit (xATLU), Expanded GELU (xGELU), and Expanded SiLU (xSiLU) and show that they outperform existing activation functions within a transformer architecture. Additionally, expanded gating ranges show promising results in improving first-order Gated Linear Units (GLU).
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_20768
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Expanded Gating Ranges Improve Activation Functions Huang, Allen Hao Neural and Evolutionary Computing Machine Learning Activation functions are core components of all deep learning architectures. Currently, the most popular activation functions are smooth ReLU variants like GELU and SiLU. These are self-gated activation functions where the range of the gating function is between zero and one. In this paper, we explore the viability of using arctan as a gating mechanism. A self-gated activation function that uses arctan as its gating function has a monotonically increasing first derivative. To make this activation function competitive, it is necessary to introduce a trainable parameter for every MLP block to expand the range of the gating function beyond zero and one. We find that this technique also improves existing self-gated activation functions. We conduct an empirical evaluation of Expanded ArcTan Linear Unit (xATLU), Expanded GELU (xGELU), and Expanded SiLU (xSiLU) and show that they outperform existing activation functions within a transformer architecture. Additionally, expanded gating ranges show promising results in improving first-order Gated Linear Units (GLU).
title	Expanded Gating Ranges Improve Activation Functions
topic	Neural and Evolutionary Computing Machine Learning
url	https://arxiv.org/abs/2405.20768

Similar Items