Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Zi, Dahl, George E., Swersky, Kevin, Lee, Chansoo, Nado, Zachary, Gilmer, Justin, Snoek, Jasper, Ghahramani, Zoubin
Format:	Preprint
Published:	2021
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2109.08215
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929447192494080
author	Wang, Zi Dahl, George E. Swersky, Kevin Lee, Chansoo Nado, Zachary Gilmer, Justin Snoek, Jasper Ghahramani, Zoubin
author_facet	Wang, Zi Dahl, George E. Swersky, Kevin Lee, Chansoo Nado, Zachary Gilmer, Justin Snoek, Jasper Ghahramani, Zoubin
contents	Bayesian optimization (BO) has become a popular strategy for global optimization of expensive real-world functions. Contrary to a common expectation that BO is suited to optimizing black-box functions, it actually requires domain knowledge about those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process (GP) priors that specify initial beliefs on functions. However, even with expert knowledge, it is non-trivial to quantitatively define a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. We detail what pre-training entails for GPs using a KL divergence based loss function, and propose a new pre-training based BO framework named HyperBO. Theoretically, we show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known. To verify our approach in realistic setups, we collect a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art deep learning models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, HyperBO is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods on both our new tuning dataset and existing multi-task BO benchmarks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2109_08215
institution	arXiv
publishDate	2021
record_format	arxiv
spellingShingle	Pre-trained Gaussian Processes for Bayesian Optimization Wang, Zi Dahl, George E. Swersky, Kevin Lee, Chansoo Nado, Zachary Gilmer, Justin Snoek, Jasper Ghahramani, Zoubin Machine Learning Bayesian optimization (BO) has become a popular strategy for global optimization of expensive real-world functions. Contrary to a common expectation that BO is suited to optimizing black-box functions, it actually requires domain knowledge about those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process (GP) priors that specify initial beliefs on functions. However, even with expert knowledge, it is non-trivial to quantitatively define a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. We detail what pre-training entails for GPs using a KL divergence based loss function, and propose a new pre-training based BO framework named HyperBO. Theoretically, we show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known. To verify our approach in realistic setups, we collect a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art deep learning models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, HyperBO is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods on both our new tuning dataset and existing multi-task BO benchmarks.
title	Pre-trained Gaussian Processes for Bayesian Optimization
topic	Machine Learning
url	https://arxiv.org/abs/2109.08215

Similar Items