Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Kristiansen, Gus, Sandler, Mark, Zhmoginov, Andrey, Miller, Nolan, Goyal, Anirudh, Lee, Jihwan, Vladymyrov, Max
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Machine Learning
Online-Zugang:	https://arxiv.org/abs/2408.09310
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866913533222977536
author	Kristiansen, Gus Sandler, Mark Zhmoginov, Andrey Miller, Nolan Goyal, Anirudh Lee, Jihwan Vladymyrov, Max
author_facet	Kristiansen, Gus Sandler, Mark Zhmoginov, Andrey Miller, Nolan Goyal, Anirudh Lee, Jihwan Vladymyrov, Max
contents	In modern deep learning, the models are learned by applying gradient updates using an optimizer, which transforms the updates based on various statistics. Optimizers are often hand-designed and tuning their hyperparameters is a big part of the training process. Learned optimizers have shown some initial promise, but are generally unsuccessful as a general optimization mechanism applicable to every problem. In this work we explore a different direction: instead of learning general optimizers, we instead specialize them to a specific training environment. We propose a novel optimizer technique that learns a layer-specific linear combination of update directions provided by a set of base optimizers, effectively adapting its strategy to the specific model and dataset. When evaluated on image classification tasks, this specialized optimizer significantly outperforms both traditional off-the-shelf methods such as Adam, as well as existing general learned optimizers. Moreover, it demonstrates robust generalization with respect to model initialization, evaluating on unseen datasets, and training durations beyond its meta-training horizon.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_09310
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Narrowing the Focus: Learned Optimizers for Pretrained Models Kristiansen, Gus Sandler, Mark Zhmoginov, Andrey Miller, Nolan Goyal, Anirudh Lee, Jihwan Vladymyrov, Max Machine Learning In modern deep learning, the models are learned by applying gradient updates using an optimizer, which transforms the updates based on various statistics. Optimizers are often hand-designed and tuning their hyperparameters is a big part of the training process. Learned optimizers have shown some initial promise, but are generally unsuccessful as a general optimization mechanism applicable to every problem. In this work we explore a different direction: instead of learning general optimizers, we instead specialize them to a specific training environment. We propose a novel optimizer technique that learns a layer-specific linear combination of update directions provided by a set of base optimizers, effectively adapting its strategy to the specific model and dataset. When evaluated on image classification tasks, this specialized optimizer significantly outperforms both traditional off-the-shelf methods such as Adam, as well as existing general learned optimizers. Moreover, it demonstrates robust generalization with respect to model initialization, evaluating on unseen datasets, and training durations beyond its meta-training horizon.
title	Narrowing the Focus: Learned Optimizers for Pretrained Models
topic	Machine Learning
url	https://arxiv.org/abs/2408.09310

Ähnliche Einträge