Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Ling, Yang, Xianliang, Yu, Juwon, Cheonyoung, Park, Lee, Miran, Song, Lei, Bian, Jiang
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.14459
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915757256867840
author	Zhang, Ling Yang, Xianliang Yu, Juwon Cheonyoung, Park Lee, Miran Song, Lei Bian, Jiang
author_facet	Zhang, Ling Yang, Xianliang Yu, Juwon Cheonyoung, Park Lee, Miran Song, Lei Bian, Jiang
contents	Fine-tuning large pretrained language models is a common approach for aligning them with human preferences, but noisy or off-target examples can dilute supervision. While small, well-chosen datasets often match the performance of much larger ones, systematic and efficient ways to identify high-value training data remain underexplored. Many current methods rely on heuristics or expensive retraining. We present a principled, resource-efficient framework for data selection and reweighting. At its core is an In-Context Approximation (ICA) that estimates the holdout loss a model would incur after training on a candidate example by conditioning on a small, curated holdout set in context. ICA requires no reference model and no additional finetuning. We define the resulting estimate as the ICA score, and derive per-example weights that dynamically reweight gradient updates as model parameters evolve. Across SFT, DPO, and SimPO, and over diverse backbones and datasets, ICA-based reweighting consistently improves model alignment with minimal overhead. We analyze sensitivity to score update frequency and the number of in-context holdout examples. We also discuss limitations in rapidly drifting on-policy settings, highlighting directions for future work. Code and prompts will be released.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_14459
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Holdout-Loss-Based Data Selection for LLM Finetuning via In-Context Learning Zhang, Ling Yang, Xianliang Yu, Juwon Cheonyoung, Park Lee, Miran Song, Lei Bian, Jiang Machine Learning Artificial Intelligence Fine-tuning large pretrained language models is a common approach for aligning them with human preferences, but noisy or off-target examples can dilute supervision. While small, well-chosen datasets often match the performance of much larger ones, systematic and efficient ways to identify high-value training data remain underexplored. Many current methods rely on heuristics or expensive retraining. We present a principled, resource-efficient framework for data selection and reweighting. At its core is an In-Context Approximation (ICA) that estimates the holdout loss a model would incur after training on a candidate example by conditioning on a small, curated holdout set in context. ICA requires no reference model and no additional finetuning. We define the resulting estimate as the ICA score, and derive per-example weights that dynamically reweight gradient updates as model parameters evolve. Across SFT, DPO, and SimPO, and over diverse backbones and datasets, ICA-based reweighting consistently improves model alignment with minimal overhead. We analyze sensitivity to score update frequency and the number of in-context holdout examples. We also discuss limitations in rapidly drifting on-policy settings, highlighting directions for future work. Code and prompts will be released.
title	Holdout-Loss-Based Data Selection for LLM Finetuning via In-Context Learning
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2510.14459

Similar Items