Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	He, Xi, Miao, Yi, Little, Max A.
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2505.05740
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910203652341760
author	He, Xi Miao, Yi Little, Max A.
author_facet	He, Xi Miao, Yi Little, Max A.
contents	This paper introduces the first globally optimal algorithm for the empirical risk minimization problem of two-layer maxout and ReLU networks, i.e., minimizing the number of misclassifications. The algorithm has a worst-case time complexity of $O\left(N^{DK+1}\right)$, where $K$ denotes the number of hidden neurons and $D$ represents the number of features. It can be can be generalized to accommodate arbitrary computable loss functions without affecting its computational complexity. Our experiments demonstrate that the proposed algorithm provides provably exact solutions for small-scale datasets. To handle larger datasets, we introduce a novel coreset selection method that reduces the data size to a manageable scale, making it feasible for our algorithm. This extension enables efficient processing of large-scale datasets and achieves significantly improved performance, with a 20-30\% reduction in misclassifications for both training and prediction, compared to state-of-the-art approaches (neural networks trained using gradient descent and support vector machines), when applied to the same models (two-layer networks with fixed hidden nodes and linear models). The artifacts of the Deep-ICE algorithm can be found in https://github.com/XiHegrt/DeepICE-algorithm-artifacts.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_05740
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Deep-ICE: the first globally optimal algorithm for minimizing 0-1 loss in two-layer ReLU and maxout networks He, Xi Miao, Yi Little, Max A. Machine Learning This paper introduces the first globally optimal algorithm for the empirical risk minimization problem of two-layer maxout and ReLU networks, i.e., minimizing the number of misclassifications. The algorithm has a worst-case time complexity of $O\left(N^{DK+1}\right)$, where $K$ denotes the number of hidden neurons and $D$ represents the number of features. It can be can be generalized to accommodate arbitrary computable loss functions without affecting its computational complexity. Our experiments demonstrate that the proposed algorithm provides provably exact solutions for small-scale datasets. To handle larger datasets, we introduce a novel coreset selection method that reduces the data size to a manageable scale, making it feasible for our algorithm. This extension enables efficient processing of large-scale datasets and achieves significantly improved performance, with a 20-30\% reduction in misclassifications for both training and prediction, compared to state-of-the-art approaches (neural networks trained using gradient descent and support vector machines), when applied to the same models (two-layer networks with fixed hidden nodes and linear models). The artifacts of the Deep-ICE algorithm can be found in https://github.com/XiHegrt/DeepICE-algorithm-artifacts.
title	Deep-ICE: the first globally optimal algorithm for minimizing 0-1 loss in two-layer ReLU and maxout networks
topic	Machine Learning
url	https://arxiv.org/abs/2505.05740

Similar Items