Saved in:
Bibliographic Details
Main Author: Gan, Eric
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.02653
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917400995168256
author Gan, Eric
author_facet Gan, Eric
contents Empirically, modern deep learning training often occurs at the Edge of Stability (EoS), where the sharpness of the loss exceeds the threshold below which classical convergence analysis applies. Despite recent progress, existing theoretical explanations of EoS either rely on restrictive assumptions or focus on specific squared-loss-type objectives. In this work, we introduce and study a structural property of loss functions that we term product-stability. We show that for losses with product-stable minima, gradient descent applied to objectives of the form $(x,y) \mapsto l(xy)$ can provably converge to the local minimum even when training in the EoS regime. This framework substantially generalizes prior results and applies to a broad class of losses, including binary cross entropy. Using bifurcation diagrams, we characterize the resulting training dynamics, explain the emergence of stable oscillations, and precisely quantify the sharpness at convergence. Together, our results offer a principled explanation for stable EoS training for a wider class of loss functions.
format Preprint
id arxiv_https___arxiv_org_abs_2604_02653
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Product-Stability: Provable Convergence for Gradient Descent on the Edge of Stability
Gan, Eric
Machine Learning
Empirically, modern deep learning training often occurs at the Edge of Stability (EoS), where the sharpness of the loss exceeds the threshold below which classical convergence analysis applies. Despite recent progress, existing theoretical explanations of EoS either rely on restrictive assumptions or focus on specific squared-loss-type objectives. In this work, we introduce and study a structural property of loss functions that we term product-stability. We show that for losses with product-stable minima, gradient descent applied to objectives of the form $(x,y) \mapsto l(xy)$ can provably converge to the local minimum even when training in the EoS regime. This framework substantially generalizes prior results and applies to a broad class of losses, including binary cross entropy. Using bifurcation diagrams, we characterize the resulting training dynamics, explain the emergence of stable oscillations, and precisely quantify the sharpness at convergence. Together, our results offer a principled explanation for stable EoS training for a wider class of loss functions.
title Product-Stability: Provable Convergence for Gradient Descent on the Edge of Stability
topic Machine Learning
url https://arxiv.org/abs/2604.02653