Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Hao, Yifan, Pan, Xingyuan, Zhang, Hanning, Ye, Chenlu, Pan, Rui, Zhang, Tong
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2506.01901
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918043072856064
author	Hao, Yifan Pan, Xingyuan Zhang, Hanning Ye, Chenlu Pan, Rui Zhang, Tong
author_facet	Hao, Yifan Pan, Xingyuan Zhang, Hanning Ye, Chenlu Pan, Rui Zhang, Tong
contents	Supervised fine-tuning (SFT) on domain-specific data is the dominant approach for adapting foundation models to specialized tasks. However, it has been observed that SFT models tend to forget knowledge acquired during pretraining. In vision models, ensembling a pretrained model with its fine-tuned counterpart has been shown to mitigate this issue. In this work, we demonstrate that the same holds for language models, and, more strikingly, we observe an overadaptation phenomenon: the ensemble model not only retains general knowledge from the foundation model but also outperforms the fine-tuned model even on the fine-tuning domain itself. Despite the empirical success of ensembling, a theoretical understanding of its benefits remains underexplored. We develop a formal theoretical analysis of the overadaptation phenomenon. Ensembling mitigates this by balancing two primary sources of error: bias, caused by insufficient fine-tuning, and variance, introduced by overfitting to fine-tuning data. While regularization techniques aim to address this trade-off, we show that ensembling provides a more effective solution. We analyze this phenomenon in over-parameterized linear settings and demonstrate that interpolating between pretrained and fine-tuned weights significantly improves performance. These findings offer theoretical justification for the observed advantages of model ensembling, supported by empirical experiments consistent with our analysis.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_01901
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods Hao, Yifan Pan, Xingyuan Zhang, Hanning Ye, Chenlu Pan, Rui Zhang, Tong Artificial Intelligence Supervised fine-tuning (SFT) on domain-specific data is the dominant approach for adapting foundation models to specialized tasks. However, it has been observed that SFT models tend to forget knowledge acquired during pretraining. In vision models, ensembling a pretrained model with its fine-tuned counterpart has been shown to mitigate this issue. In this work, we demonstrate that the same holds for language models, and, more strikingly, we observe an overadaptation phenomenon: the ensemble model not only retains general knowledge from the foundation model but also outperforms the fine-tuned model even on the fine-tuning domain itself. Despite the empirical success of ensembling, a theoretical understanding of its benefits remains underexplored. We develop a formal theoretical analysis of the overadaptation phenomenon. Ensembling mitigates this by balancing two primary sources of error: bias, caused by insufficient fine-tuning, and variance, introduced by overfitting to fine-tuning data. While regularization techniques aim to address this trade-off, we show that ensembling provides a more effective solution. We analyze this phenomenon in over-parameterized linear settings and demonstrate that interpolating between pretrained and fine-tuned weights significantly improves performance. These findings offer theoretical justification for the observed advantages of model ensembling, supported by empirical experiments consistent with our analysis.
title	Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
topic	Artificial Intelligence
url	https://arxiv.org/abs/2506.01901

Similar Items