Gespeichert in:
| Hauptverfasser: | , |
|---|---|
| Format: | Preprint |
| Veröffentlicht: |
2022
|
| Schlagworte: | |
| Online-Zugang: | https://arxiv.org/abs/2206.00395 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Inhaltsangabe:
- We investigate the fundamental optimization question of minimizing a target function $f$, whose gradients are expensive to compute or have limited availability, given access to some auxiliary side function $h$ whose gradients are cheap or more available. This formulation captures many settings of practical relevance, such as i) re-using batches in SGD, ii) transfer learning, iii) federated learning, iv) training with compressed models/dropout, Et cetera. We propose two generic new algorithms that apply in all these settings; we also prove that we can benefit from this framework under the Hessian similarity assumption between the target and side information. A benefit is obtained when this similarity measure is small; we also show a potential benefit from stochasticity when the auxiliary noise is correlated with that of the target function.