Saved in:
Bibliographic Details
Main Authors: Yang, Yunfei, Zhou, Ding-Xuan
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2306.08321
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914822309806080
author Yang, Yunfei
Zhou, Ding-Xuan
author_facet Yang, Yunfei
Zhou, Ding-Xuan
contents It is shown that over-parameterized neural networks can achieve minimax optimal rates of convergence (up to logarithmic factors) for learning functions from certain smooth function classes, if the weights are suitably constrained or regularized. Specifically, we consider the nonparametric regression of estimating an unknown $d$-variate function by using shallow ReLU neural networks. It is assumed that the regression function is from the Hölder space with smoothness $α<(d+3)/2$ or a variation space corresponding to shallow neural networks, which can be viewed as an infinitely wide neural network. In this setting, we prove that least squares estimators based on shallow neural networks with certain norm constraints on the weights are minimax optimal, if the network width is sufficiently large. As a byproduct, we derive a new size-independent bound for the local Rademacher complexity of shallow ReLU neural networks, which may be of independent interest.
format Preprint
id arxiv_https___arxiv_org_abs_2306_08321
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Nonparametric regression using over-parameterized shallow ReLU neural networks
Yang, Yunfei
Zhou, Ding-Xuan
Machine Learning
Statistics Theory
It is shown that over-parameterized neural networks can achieve minimax optimal rates of convergence (up to logarithmic factors) for learning functions from certain smooth function classes, if the weights are suitably constrained or regularized. Specifically, we consider the nonparametric regression of estimating an unknown $d$-variate function by using shallow ReLU neural networks. It is assumed that the regression function is from the Hölder space with smoothness $α<(d+3)/2$ or a variation space corresponding to shallow neural networks, which can be viewed as an infinitely wide neural network. In this setting, we prove that least squares estimators based on shallow neural networks with certain norm constraints on the weights are minimax optimal, if the network width is sufficiently large. As a byproduct, we derive a new size-independent bound for the local Rademacher complexity of shallow ReLU neural networks, which may be of independent interest.
title Nonparametric regression using over-parameterized shallow ReLU neural networks
topic Machine Learning
Statistics Theory
url https://arxiv.org/abs/2306.08321