Saved in:
Bibliographic Details
Main Authors: Xu, Xingyu, Shen, Yandi, Chi, Yuejie, Ma, Cong
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2302.01186
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912795492089856
author Xu, Xingyu
Shen, Yandi
Chi, Yuejie
Ma, Cong
author_facet Xu, Xingyu
Shen, Yandi
Chi, Yuejie
Ma, Cong
contents We propose $\textsf{ScaledGD($λ$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($λ$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($λ$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($λ$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.
format Preprint
id arxiv_https___arxiv_org_abs_2302_01186
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing
Xu, Xingyu
Shen, Yandi
Chi, Yuejie
Ma, Cong
Machine Learning
Signal Processing
Optimization and Control
We propose $\textsf{ScaledGD($λ$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($λ$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($λ$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($λ$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.
title The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing
topic Machine Learning
Signal Processing
Optimization and Control
url https://arxiv.org/abs/2302.01186