Saved in:
Bibliographic Details
Main Authors: Moshtaghifar, Mohammad, Rodomanov, Anton, Vankov, Daniil, Stich, Sebastian
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2501.10258
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908980955054080
author Moshtaghifar, Mohammad
Rodomanov, Anton
Vankov, Daniil
Stich, Sebastian
author_facet Moshtaghifar, Mohammad
Rodomanov, Anton
Vankov, Daniil
Stich, Sebastian
contents We present a novel universal gradient method for solving convex optimization problems. Our algorithm, Dual Averaging with Distance Adaptation (DADA), is based on the classical scheme of dual averaging and dynamically adjusts its coefficients based on observed gradients and the distance between iterates and the starting point, eliminating the need for problem-specific parameters. DADA is a universal algorithm that simultaneously works for a broad spectrum of problem classes, provided the local growth of the objective function around its minimizer can be bounded. Particular examples of such problem classes are nonsmooth Lipschitz functions, Lipschitz-smooth functions, Hölder-smooth functions, functions with high-order Lipschitz derivative, quasi-self-concordant functions, and $(L_0,L_1)$-smooth functions. Crucially, DADA is applicable to both unconstrained and constrained problems, even when the domain is unbounded, without requiring prior knowledge of the number of iterations or desired accuracy.
format Preprint
id arxiv_https___arxiv_org_abs_2501_10258
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle DADA: Dual Averaging with Distance Adaptation
Moshtaghifar, Mohammad
Rodomanov, Anton
Vankov, Daniil
Stich, Sebastian
Optimization and Control
Machine Learning
We present a novel universal gradient method for solving convex optimization problems. Our algorithm, Dual Averaging with Distance Adaptation (DADA), is based on the classical scheme of dual averaging and dynamically adjusts its coefficients based on observed gradients and the distance between iterates and the starting point, eliminating the need for problem-specific parameters. DADA is a universal algorithm that simultaneously works for a broad spectrum of problem classes, provided the local growth of the objective function around its minimizer can be bounded. Particular examples of such problem classes are nonsmooth Lipschitz functions, Lipschitz-smooth functions, Hölder-smooth functions, functions with high-order Lipschitz derivative, quasi-self-concordant functions, and $(L_0,L_1)$-smooth functions. Crucially, DADA is applicable to both unconstrained and constrained problems, even when the domain is unbounded, without requiring prior knowledge of the number of iterations or desired accuracy.
title DADA: Dual Averaging with Distance Adaptation
topic Optimization and Control
Machine Learning
url https://arxiv.org/abs/2501.10258