Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sfyraki, Maria-Eleni, Wang, Jun-Kun
Format:	Preprint
Published:	2025
Subjects:	Optimization and Control Machine Learning
Online Access:	https://arxiv.org/abs/2506.04192
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908801338179584
author	Sfyraki, Maria-Eleni Wang, Jun-Kun
author_facet	Sfyraki, Maria-Eleni Wang, Jun-Kun
contents	Stochastic Frank-Wolfe is a classical optimization method for solving constrained optimization problems. On the other hand, recent optimizers such as Lion and Muon have gained quite significant popularity in deep learning. In this work, building on recent initiatives, we provide a unifying perspective by interpreting these seemingly disparate methods through the lens of Stochastic Frank-Wolfe. Specifically, we show that Lion and Muon with weight decay can be viewed as special instances of a Stochastic Frank-Wolfe, and we establish their convergence guarantees in terms of the Frank-Wolfe gap, a standard stationarity measure in non-convex optimization for Frank-Wolfe methods. We further find that convergence to this gap implies convergence to a KKT point of the original problem under a norm constraint for Lion and Muon. Moreover, motivated by recent empirical findings that stochastic gradients in modern machine learning tasks often exhibit heavy-tailed distributions, we extend Stochastic Frank-Wolfe to settings with heavy-tailed noise by developing two robust variants with strong theoretical guarantees that hold for general compact convex sets without the need for a large batch size, filling the gap in the literature on Stochastic Frank-Wolfe for non-convex optimization. Our contributions in the later part of this work, in turn, yield new variants of Lion and Muon, that better accommodate heavy-tailed gradient noise, thereby enhancing their practical scope.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_04192
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Lions and Muons: Optimization via Stochastic Frank-Wolfe Sfyraki, Maria-Eleni Wang, Jun-Kun Optimization and Control Machine Learning Stochastic Frank-Wolfe is a classical optimization method for solving constrained optimization problems. On the other hand, recent optimizers such as Lion and Muon have gained quite significant popularity in deep learning. In this work, building on recent initiatives, we provide a unifying perspective by interpreting these seemingly disparate methods through the lens of Stochastic Frank-Wolfe. Specifically, we show that Lion and Muon with weight decay can be viewed as special instances of a Stochastic Frank-Wolfe, and we establish their convergence guarantees in terms of the Frank-Wolfe gap, a standard stationarity measure in non-convex optimization for Frank-Wolfe methods. We further find that convergence to this gap implies convergence to a KKT point of the original problem under a norm constraint for Lion and Muon. Moreover, motivated by recent empirical findings that stochastic gradients in modern machine learning tasks often exhibit heavy-tailed distributions, we extend Stochastic Frank-Wolfe to settings with heavy-tailed noise by developing two robust variants with strong theoretical guarantees that hold for general compact convex sets without the need for a large batch size, filling the gap in the literature on Stochastic Frank-Wolfe for non-convex optimization. Our contributions in the later part of this work, in turn, yield new variants of Lion and Muon, that better accommodate heavy-tailed gradient noise, thereby enhancing their practical scope.
title	Lions and Muons: Optimization via Stochastic Frank-Wolfe
topic	Optimization and Control Machine Learning
url	https://arxiv.org/abs/2506.04192

Similar Items