Saved in:
Bibliographic Details
Main Authors: Yang, Yanlai, Jones, Matt, Mozer, Michael C., Ren, Mengye
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2403.09613
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer from catastrophic interference when training on a sequence of documents; however, we discover a curious and remarkable property of LLMs finetuned sequentially in this setting: they exhibit anticipatory behavior, recovering from the forgetting on documents before encountering them again. This behavior occurs even though the documents are never presented in context together. The behavior emerges and becomes more robust as the architecture scales up its number of parameters. Through comprehensive experiments and visualizations, we demonstrate a new mechanism by which over-parametrized neural networks can recover from catastrophic interference and uncover new insights into training over-parameterized networks in cyclically structured environments.