Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xu, Xingyu, Shen, Yandi, Chi, Yuejie, Ma, Cong
Format:	Preprint
Published:	2023
Subjects:	Machine Learning Signal Processing Optimization and Control
Online Access:	https://arxiv.org/abs/2302.01186
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912795492089856
author	Xu, Xingyu Shen, Yandi Chi, Yuejie Ma, Cong
author_facet	Xu, Xingyu Shen, Yandi Chi, Yuejie Ma, Cong
contents	We propose $\textsf{ScaledGD($λ$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($λ$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($λ$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($λ$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.
format	Preprint
id	arxiv_https___arxiv_org_abs_2302_01186
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing Xu, Xingyu Shen, Yandi Chi, Yuejie Ma, Cong Machine Learning Signal Processing Optimization and Control We propose $\textsf{ScaledGD($λ$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($λ$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($λ$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($λ$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.
title	The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing
topic	Machine Learning Signal Processing Optimization and Control
url	https://arxiv.org/abs/2302.01186

Similar Items