Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Szegedy, Balázs, Czifra, Domonkos, Kőrösi-Szabó, Péter
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence Optimization and Control
Online Access:	https://arxiv.org/abs/2402.15262
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929254619414528
author	Szegedy, Balázs Czifra, Domonkos Kőrösi-Szabó, Péter
author_facet	Szegedy, Balázs Czifra, Domonkos Kőrösi-Szabó, Péter
contents	Define an optimizer as having memory $k$ if it stores $k$ dynamically changing vectors in the parameter space. Classical SGD has memory $0$, momentum SGD optimizer has $1$ and Adam optimizer has $2$. We address the following questions: How can optimizers make use of more memory units? What information should be stored in them? How to use them for the learning steps? As an approach to the last question, we introduce a general method called "Retrospective Learning Law Correction" or shortly RLLC. This method is designed to calculate a dynamically varying linear combination (called learning law) of memory units, which themselves may evolve arbitrarily. We demonstrate RLLC on optimizers whose memory units have linear update rules and small memory ($\leq 4$ memory units). Our experiments show that in a variety of standard problems, these optimizers outperform the above mentioned three classical optimizers. We conclude that RLLC is a promising framework for boosting the performance of known optimizers by adding more memory units and by making them more adaptive.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_15262
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Dynamic Memory Based Adaptive Optimization Szegedy, Balázs Czifra, Domonkos Kőrösi-Szabó, Péter Machine Learning Artificial Intelligence Optimization and Control Define an optimizer as having memory $k$ if it stores $k$ dynamically changing vectors in the parameter space. Classical SGD has memory $0$, momentum SGD optimizer has $1$ and Adam optimizer has $2$. We address the following questions: How can optimizers make use of more memory units? What information should be stored in them? How to use them for the learning steps? As an approach to the last question, we introduce a general method called "Retrospective Learning Law Correction" or shortly RLLC. This method is designed to calculate a dynamically varying linear combination (called learning law) of memory units, which themselves may evolve arbitrarily. We demonstrate RLLC on optimizers whose memory units have linear update rules and small memory ($\leq 4$ memory units). Our experiments show that in a variety of standard problems, these optimizers outperform the above mentioned three classical optimizers. We conclude that RLLC is a promising framework for boosting the performance of known optimizers by adding more memory units and by making them more adaptive.
title	Dynamic Memory Based Adaptive Optimization
topic	Machine Learning Artificial Intelligence Optimization and Control
url	https://arxiv.org/abs/2402.15262

Similar Items