Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yano, Kazuki, Ito, Takumi, Suzuki, Jun
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2504.04151
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Pre-training large language models (LLMs) faces significant memory challenges due to the large size of model parameters. We introduce STaged parameter-Efficient Pre-training (STEP), which integrates parameter-efficient tuning techniques with model growth. We conduct experiments on pre-training LLMs of various sizes and demonstrate that STEP achieves up to a 53.9% reduction in maximum memory requirements compared to vanilla pre-training while maintaining equivalent performance. Furthermore, we show that the model by STEP performs comparably to vanilla pre-trained models on downstream tasks after instruction tuning.

Similar Items