Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Holzmüller, David, Grinsztajn, Léo, Steinwart, Ingo
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2407.04491
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913650384568320
author	Holzmüller, David Grinsztajn, Léo Steinwart, Ingo
author_facet	Holzmüller, David Grinsztajn, Léo Steinwart, Ingo
contents	For classification and regression on tabular data, the dominance of gradient-boosted decision trees (GBDTs) has recently been challenged by often much slower deep learning methods with extensive hyperparameter tuning. We address this discrepancy by introducing (a) RealMLP, an improved multilayer perceptron (MLP), and (b) strong meta-tuned default parameters for GBDTs and RealMLP. We tune RealMLP and the default parameters on a meta-train benchmark with 118 datasets and compare them to hyperparameter-optimized versions on a disjoint meta-test benchmark with 90 datasets, as well as the GBDT-friendly benchmark by Grinsztajn et al. (2022). Our benchmark results on medium-to-large tabular datasets (1K--500K samples) show that RealMLP offers a favorable time-accuracy tradeoff compared to other neural baselines and is competitive with GBDTs in terms of benchmark scores. Moreover, a combination of RealMLP and GBDTs with improved default parameters can achieve excellent results without hyperparameter tuning. Finally, we demonstrate that some of RealMLP's improvements can also considerably improve the performance of TabR with default parameters.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_04491
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data Holzmüller, David Grinsztajn, Léo Steinwart, Ingo Machine Learning For classification and regression on tabular data, the dominance of gradient-boosted decision trees (GBDTs) has recently been challenged by often much slower deep learning methods with extensive hyperparameter tuning. We address this discrepancy by introducing (a) RealMLP, an improved multilayer perceptron (MLP), and (b) strong meta-tuned default parameters for GBDTs and RealMLP. We tune RealMLP and the default parameters on a meta-train benchmark with 118 datasets and compare them to hyperparameter-optimized versions on a disjoint meta-test benchmark with 90 datasets, as well as the GBDT-friendly benchmark by Grinsztajn et al. (2022). Our benchmark results on medium-to-large tabular datasets (1K--500K samples) show that RealMLP offers a favorable time-accuracy tradeoff compared to other neural baselines and is competitive with GBDTs in terms of benchmark scores. Moreover, a combination of RealMLP and GBDTs with improved default parameters can achieve excellent results without hyperparameter tuning. Finally, we demonstrate that some of RealMLP's improvements can also considerably improve the performance of TabR with default parameters.
title	Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data
topic	Machine Learning
url	https://arxiv.org/abs/2407.04491

Similar Items