Saved in:
Bibliographic Details
Main Authors: He, Chang, Gao, Wenzhi, Jiang, Bo, Udell, Madeleine, Zhang, Shuzhong
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2512.06231
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910044589654016
author He, Chang
Gao, Wenzhi
Jiang, Bo
Udell, Madeleine
Zhang, Shuzhong
author_facet He, Chang
Gao, Wenzhi
Jiang, Bo
Udell, Madeleine
Zhang, Shuzhong
contents In this paper, we revisit a classical adaptive stepsize strategy for gradient descent: the Polyak stepsize (PolyakGD), originally proposed in Polyak (1969). We study the convergence behavior of PolyakGD from two perspectives: tight worst-case analysis and universality across function classes. As our first main result, we establish the tightness of the known convergence rates of PolyakGD by explicitly constructing worst-case functions. In particular, we show that the $O((1-\frac{1}κ)^K)$ rate for smooth strongly convex functions and the $O(1/K)$ rate for smooth convex functions are both tight. Moreover, we theoretically show that PolyakGD automatically exploits floating-point errors to escape the worst-case behavior. Our second main result provides new convergence guarantees for PolyakGD under both Hölder smoothness and Hölder growth conditions. These findings show that the Polyak stepsize is universal, automatically adapting to various function classes without requiring prior knowledge of problem parameters.
format Preprint
id arxiv_https___arxiv_org_abs_2512_06231
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle New Results on the Polyak Stepsize: Tight Convergence Analysis and Universal Function Classes
He, Chang
Gao, Wenzhi
Jiang, Bo
Udell, Madeleine
Zhang, Shuzhong
Optimization and Control
In this paper, we revisit a classical adaptive stepsize strategy for gradient descent: the Polyak stepsize (PolyakGD), originally proposed in Polyak (1969). We study the convergence behavior of PolyakGD from two perspectives: tight worst-case analysis and universality across function classes. As our first main result, we establish the tightness of the known convergence rates of PolyakGD by explicitly constructing worst-case functions. In particular, we show that the $O((1-\frac{1}κ)^K)$ rate for smooth strongly convex functions and the $O(1/K)$ rate for smooth convex functions are both tight. Moreover, we theoretically show that PolyakGD automatically exploits floating-point errors to escape the worst-case behavior. Our second main result provides new convergence guarantees for PolyakGD under both Hölder smoothness and Hölder growth conditions. These findings show that the Polyak stepsize is universal, automatically adapting to various function classes without requiring prior knowledge of problem parameters.
title New Results on the Polyak Stepsize: Tight Convergence Analysis and Universal Function Classes
topic Optimization and Control
url https://arxiv.org/abs/2512.06231