Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2302.01186 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866912795492089856 |
|---|---|
| author | Xu, Xingyu Shen, Yandi Chi, Yuejie Ma, Cong |
| author_facet | Xu, Xingyu Shen, Yandi Chi, Yuejie Ma, Cong |
| contents | We propose $\textsf{ScaledGD($λ$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($λ$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($λ$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($λ$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2302_01186 |
| institution | arXiv |
| publishDate | 2023 |
| record_format | arxiv |
| spellingShingle | The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing Xu, Xingyu Shen, Yandi Chi, Yuejie Ma, Cong Machine Learning Signal Processing Optimization and Control We propose $\textsf{ScaledGD($λ$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($λ$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($λ$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($λ$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning. |
| title | The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing |
| topic | Machine Learning Signal Processing Optimization and Control |
| url | https://arxiv.org/abs/2302.01186 |