में बचाया:
ग्रंथसूची विवरण
मुख्य लेखकों: Morwani, Depen, Shapira, Itai, Vyas, Nikhil, Malach, Eran, Kakade, Sham, Janson, Lucas
स्वरूप: Preprint
प्रकाशित: 2024
विषय:
ऑनलाइन पहुंच:https://arxiv.org/abs/2406.17748
टैग: टैग जोड़ें
कोई टैग नहीं, इस रिकॉर्ड को टैग करने वाले पहले व्यक्ति बनें!
_version_ 1866914848828293120
author Morwani, Depen
Shapira, Itai
Vyas, Nikhil
Malach, Eran
Kakade, Sham
Janson, Lucas
author_facet Morwani, Depen
Shapira, Itai
Vyas, Nikhil
Malach, Eran
Kakade, Sham
Janson, Lucas
contents Shampoo, a second-order optimization algorithm which uses a Kronecker product preconditioner, has recently garnered increasing attention from the machine learning community. The preconditioner used by Shampoo can be viewed either as an approximation of the Gauss--Newton component of the Hessian or the covariance matrix of the gradients maintained by Adagrad. We provide an explicit and novel connection between the $\textit{optimal}$ Kronecker product approximation of these matrices and the approximation made by Shampoo. Our connection highlights a subtle but common misconception about Shampoo's approximation. In particular, the $\textit{square}$ of the approximation used by the Shampoo optimizer is equivalent to a single step of the power iteration algorithm for computing the aforementioned optimal Kronecker product approximation. Across a variety of datasets and architectures we empirically demonstrate that this is close to the optimal Kronecker product approximation. Additionally, for the Hessian approximation viewpoint, we empirically study the impact of various practical tricks to make Shampoo more computationally efficient (such as using the batch gradient and the empirical Fisher) on the quality of Hessian approximation.
format Preprint
id arxiv_https___arxiv_org_abs_2406_17748
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle A New Perspective on Shampoo's Preconditioner
Morwani, Depen
Shapira, Itai
Vyas, Nikhil
Malach, Eran
Kakade, Sham
Janson, Lucas
Machine Learning
Optimization and Control
Shampoo, a second-order optimization algorithm which uses a Kronecker product preconditioner, has recently garnered increasing attention from the machine learning community. The preconditioner used by Shampoo can be viewed either as an approximation of the Gauss--Newton component of the Hessian or the covariance matrix of the gradients maintained by Adagrad. We provide an explicit and novel connection between the $\textit{optimal}$ Kronecker product approximation of these matrices and the approximation made by Shampoo. Our connection highlights a subtle but common misconception about Shampoo's approximation. In particular, the $\textit{square}$ of the approximation used by the Shampoo optimizer is equivalent to a single step of the power iteration algorithm for computing the aforementioned optimal Kronecker product approximation. Across a variety of datasets and architectures we empirically demonstrate that this is close to the optimal Kronecker product approximation. Additionally, for the Hessian approximation viewpoint, we empirically study the impact of various practical tricks to make Shampoo more computationally efficient (such as using the batch gradient and the empirical Fisher) on the quality of Hessian approximation.
title A New Perspective on Shampoo's Preconditioner
topic Machine Learning
Optimization and Control
url https://arxiv.org/abs/2406.17748