Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Feng, Di, Zhang, Chenhao, Zhao, Zhanzhan
Format:	Preprint
Published:	2026
Subjects:	Computer Science and Game Theory
Online Access:	https://arxiv.org/abs/2605.07419
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911661904887808
author	Feng, Di Zhang, Chenhao Zhao, Zhanzhan
author_facet	Feng, Di Zhang, Chenhao Zhao, Zhanzhan
contents	The continued improvement of large language models (LLMs) increasingly depends on eliciting high-quality, user-generated data, yet such data are costly to provide and often withheld due to privacy and effort concerns. This creates a fundamental design challenge: how to incentivize data contribution when model improvements require coordinated, threshold-level inputs, while contributions remain privately costly and partially reversible. We develop and theoretically analyze incentive mechanisms for user data contribution that explicitly account for threshold effects and reversibility, focusing on how subsidies and withdrawal rights can be jointly designed to overcome coordination failure. As a natural benchmark, we first consider subsidy-based incentives, under which users respond to posted payments with privately optimal floor contributions. These decentralized responses may fall below the improvement threshold, resulting in subsidy expenditure without model improvements. We then analyze mechanisms with withdrawal rights, in which users report costs, the provider centrally assigns contribution burdens, and users may withdraw before training. We prove that combining cost reporting with personalized assignment can eliminate inefficient provision by ensuring that data are collected only when improvement is sustainable, converting infeasible instances into a null outcome rather than subsidy leakage. Finally, we compare two withdrawal protocols. The simultaneous protocol can achieve lower total cost, while the small-first sequential protocol better incentivizes participation, encouraging greater data provision and thereby increasing the probability of crossing the improvement threshold.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_07419
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Incentivizing User Data Contributions for LLM Improvement under Withdrawal Rights Feng, Di Zhang, Chenhao Zhao, Zhanzhan Computer Science and Game Theory The continued improvement of large language models (LLMs) increasingly depends on eliciting high-quality, user-generated data, yet such data are costly to provide and often withheld due to privacy and effort concerns. This creates a fundamental design challenge: how to incentivize data contribution when model improvements require coordinated, threshold-level inputs, while contributions remain privately costly and partially reversible. We develop and theoretically analyze incentive mechanisms for user data contribution that explicitly account for threshold effects and reversibility, focusing on how subsidies and withdrawal rights can be jointly designed to overcome coordination failure. As a natural benchmark, we first consider subsidy-based incentives, under which users respond to posted payments with privately optimal floor contributions. These decentralized responses may fall below the improvement threshold, resulting in subsidy expenditure without model improvements. We then analyze mechanisms with withdrawal rights, in which users report costs, the provider centrally assigns contribution burdens, and users may withdraw before training. We prove that combining cost reporting with personalized assignment can eliminate inefficient provision by ensuring that data are collected only when improvement is sustainable, converting infeasible instances into a null outcome rather than subsidy leakage. Finally, we compare two withdrawal protocols. The simultaneous protocol can achieve lower total cost, while the small-first sequential protocol better incentivizes participation, encouraging greater data provision and thereby increasing the probability of crossing the improvement threshold.
title	Incentivizing User Data Contributions for LLM Improvement under Withdrawal Rights
topic	Computer Science and Game Theory
url	https://arxiv.org/abs/2605.07419

Similar Items