Saved in:
Bibliographic Details
Main Authors: Wang, Jing, Shen, Jie, Foster, Dean, Karnin, Zohar, Weiss, Jeremy C
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.00952
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918317831225344
author Wang, Jing
Shen, Jie
Foster, Dean
Karnin, Zohar
Weiss, Jeremy C
author_facet Wang, Jing
Shen, Jie
Foster, Dean
Karnin, Zohar
Weiss, Jeremy C
contents The trade-off between labeled data availability and downstream accuracy remains a central challenge in fine-tuning large language models (LLMs). We propose a principled framework for \emph{budget-aware supervised fine-tuning} by casting LLM adaptation as a contextual Stackelberg game. In our formulation, the learner (leader) commits to a scoring policy and a label-querying strategy, while an adaptive environment (follower) selects challenging supervised alternatives in response. To explicitly address label efficiency, we incorporate a finite supervision budget directly into the learning objective. Our algorithm operates in the full-feedback regime and achieves $\tilde{O}(d\sqrt{T})$ regret under standard linear contextual assumptions. We extend the framework with a Largest-Latency-First (LLF) confidence gate that selectively queries labels, achieving a budget-aware regret bound of $\tilde{O}(\sqrt{dB} + c\sqrt{B})$ with $B=βT$.
format Preprint
id arxiv_https___arxiv_org_abs_2602_00952
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Optimal Budgeted Adaptation of Large Language Models
Wang, Jing
Shen, Jie
Foster, Dean
Karnin, Zohar
Weiss, Jeremy C
Machine Learning
The trade-off between labeled data availability and downstream accuracy remains a central challenge in fine-tuning large language models (LLMs). We propose a principled framework for \emph{budget-aware supervised fine-tuning} by casting LLM adaptation as a contextual Stackelberg game. In our formulation, the learner (leader) commits to a scoring policy and a label-querying strategy, while an adaptive environment (follower) selects challenging supervised alternatives in response. To explicitly address label efficiency, we incorporate a finite supervision budget directly into the learning objective. Our algorithm operates in the full-feedback regime and achieves $\tilde{O}(d\sqrt{T})$ regret under standard linear contextual assumptions. We extend the framework with a Largest-Latency-First (LLF) confidence gate that selectively queries labels, achieving a budget-aware regret bound of $\tilde{O}(\sqrt{dB} + c\sqrt{B})$ with $B=βT$.
title Optimal Budgeted Adaptation of Large Language Models
topic Machine Learning
url https://arxiv.org/abs/2602.00952