स्टाफ के लिए:

में बचाया:

ग्रंथसूची विवरण
मुख्य लेखकों:	Morwani, Depen, Shapira, Itai, Vyas, Nikhil, Malach, Eran, Kakade, Sham, Janson, Lucas
स्वरूप:	Preprint
प्रकाशित:	2024
विषय:	Machine Learning Optimization and Control
ऑनलाइन पहुंच:	https://arxiv.org/abs/2406.17748
टैग:	टैग जोड़ें कोई टैग नहीं, इस रिकॉर्ड को टैग करने वाले पहले व्यक्ति बनें!

_version_	1866914848828293120
author	Morwani, Depen Shapira, Itai Vyas, Nikhil Malach, Eran Kakade, Sham Janson, Lucas
author_facet	Morwani, Depen Shapira, Itai Vyas, Nikhil Malach, Eran Kakade, Sham Janson, Lucas
contents	Shampoo, a second-order optimization algorithm which uses a Kronecker product preconditioner, has recently garnered increasing attention from the machine learning community. The preconditioner used by Shampoo can be viewed either as an approximation of the Gauss--Newton component of the Hessian or the covariance matrix of the gradients maintained by Adagrad. We provide an explicit and novel connection between the $\textit{optimal}$ Kronecker product approximation of these matrices and the approximation made by Shampoo. Our connection highlights a subtle but common misconception about Shampoo's approximation. In particular, the $\textit{square}$ of the approximation used by the Shampoo optimizer is equivalent to a single step of the power iteration algorithm for computing the aforementioned optimal Kronecker product approximation. Across a variety of datasets and architectures we empirically demonstrate that this is close to the optimal Kronecker product approximation. Additionally, for the Hessian approximation viewpoint, we empirically study the impact of various practical tricks to make Shampoo more computationally efficient (such as using the batch gradient and the empirical Fisher) on the quality of Hessian approximation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_17748
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	A New Perspective on Shampoo's Preconditioner Morwani, Depen Shapira, Itai Vyas, Nikhil Malach, Eran Kakade, Sham Janson, Lucas Machine Learning Optimization and Control Shampoo, a second-order optimization algorithm which uses a Kronecker product preconditioner, has recently garnered increasing attention from the machine learning community. The preconditioner used by Shampoo can be viewed either as an approximation of the Gauss--Newton component of the Hessian or the covariance matrix of the gradients maintained by Adagrad. We provide an explicit and novel connection between the $\textit{optimal}$ Kronecker product approximation of these matrices and the approximation made by Shampoo. Our connection highlights a subtle but common misconception about Shampoo's approximation. In particular, the $\textit{square}$ of the approximation used by the Shampoo optimizer is equivalent to a single step of the power iteration algorithm for computing the aforementioned optimal Kronecker product approximation. Across a variety of datasets and architectures we empirically demonstrate that this is close to the optimal Kronecker product approximation. Additionally, for the Hessian approximation viewpoint, we empirically study the impact of various practical tricks to make Shampoo more computationally efficient (such as using the batch gradient and the empirical Fisher) on the quality of Hessian approximation.
title	A New Perspective on Shampoo's Preconditioner
topic	Machine Learning Optimization and Control
url	https://arxiv.org/abs/2406.17748

समान संसाधन