Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Shin, Jonghyun, Kim, Namjun, Hwang, Geonho, Park, Sejun
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2504.07371
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916682958635008
author	Shin, Jonghyun Kim, Namjun Hwang, Geonho Park, Sejun
author_facet	Shin, Jonghyun Kim, Namjun Hwang, Geonho Park, Sejun
contents	The exact minimum width that allows for universal approximation of unbounded-depth networks is known only for ReLU and its variants. In this work, we study the minimum width of networks using general activation functions. Specifically, we focus on squashable functions that can approximate the identity function and binary step function by alternatively composing with affine transformations. We show that for networks using a squashable activation function to universally approximate $L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$, the minimum width is $\max\{d_x,d_y,2\}$ unless $d_x=d_y=1$; the same bound holds for $d_x=d_y=1$ if the activation function is monotone. We then provide sufficient conditions for squashability and show that all non-affine analytic functions and a class of piecewise functions are squashable, i.e., our minimum width result holds for those general classes of activation functions.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_07371
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Minimum width for universal approximation using squashable activation functions Shin, Jonghyun Kim, Namjun Hwang, Geonho Park, Sejun Machine Learning The exact minimum width that allows for universal approximation of unbounded-depth networks is known only for ReLU and its variants. In this work, we study the minimum width of networks using general activation functions. Specifically, we focus on squashable functions that can approximate the identity function and binary step function by alternatively composing with affine transformations. We show that for networks using a squashable activation function to universally approximate $L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$, the minimum width is $\max\{d_x,d_y,2\}$ unless $d_x=d_y=1$; the same bound holds for $d_x=d_y=1$ if the activation function is monotone. We then provide sufficient conditions for squashability and show that all non-affine analytic functions and a class of piecewise functions are squashable, i.e., our minimum width result holds for those general classes of activation functions.
title	Minimum width for universal approximation using squashable activation functions
topic	Machine Learning
url	https://arxiv.org/abs/2504.07371

Similar Items