Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Rahamim, Adir, Saphra, Naomi, Kangaslahti, Sara, Belinkov, Yonatan
Format: Preprint
Veröffentlicht: 2024
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2409.04206
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866929489889460224
author Rahamim, Adir
Saphra, Naomi
Kangaslahti, Sara
Belinkov, Yonatan
author_facet Rahamim, Adir
Saphra, Naomi
Kangaslahti, Sara
Belinkov, Yonatan
contents Parameter efficient finetuning methods like low-rank adaptation (LoRA) aim to reduce the computational costs of finetuning pretrained Language Models (LMs). Enabled by these low-rank settings, we propose an even more efficient optimization strategy: Fast Forward, a simple and effective approach to accelerate large segments of training. In a Fast Forward stage, we repeat the most recent optimizer step until the loss stops improving on a tiny validation set. By alternating between regular optimization steps and Fast Forward stages, Fast Forward provides up to an 87\% reduction in FLOPs and up to an 81\% reduction in train time over standard SGD with Adam. We validate Fast Forward by finetuning various models on different tasks and demonstrate that it speeds up training without compromising model performance. Additionally, we analyze when and how to apply Fast Forward.
format Preprint
id arxiv_https___arxiv_org_abs_2409_04206
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Fast Forwarding Low-Rank Training
Rahamim, Adir
Saphra, Naomi
Kangaslahti, Sara
Belinkov, Yonatan
Machine Learning
Computation and Language
Parameter efficient finetuning methods like low-rank adaptation (LoRA) aim to reduce the computational costs of finetuning pretrained Language Models (LMs). Enabled by these low-rank settings, we propose an even more efficient optimization strategy: Fast Forward, a simple and effective approach to accelerate large segments of training. In a Fast Forward stage, we repeat the most recent optimizer step until the loss stops improving on a tiny validation set. By alternating between regular optimization steps and Fast Forward stages, Fast Forward provides up to an 87\% reduction in FLOPs and up to an 81\% reduction in train time over standard SGD with Adam. We validate Fast Forward by finetuning various models on different tasks and demonstrate that it speeds up training without compromising model performance. Additionally, we analyze when and how to apply Fast Forward.
title Fast Forwarding Low-Rank Training
topic Machine Learning
Computation and Language
url https://arxiv.org/abs/2409.04206