Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Guo, Wenya, Zhang, Zhengkun, Liu, Xumeng, Zhang, Ying, Lu, Ziyu, Zhu, Haoze, Liu, Xubo, Yan, Ruxue
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2505.12754
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913846565797888
author	Guo, Wenya Zhang, Zhengkun Liu, Xumeng Zhang, Ying Lu, Ziyu Zhu, Haoze Liu, Xubo Yan, Ruxue
author_facet	Guo, Wenya Zhang, Zhengkun Liu, Xumeng Zhang, Ying Lu, Ziyu Zhu, Haoze Liu, Xubo Yan, Ruxue
contents	Instruction data selection aims to identify a high-quality subset from the training set that matches or exceeds the performance of the full dataset on target tasks. Existing methods focus on the instruction-to-response mapping, but neglect the human preference for diverse responses. In this paper, we propose Preference-oriented Data Selection method (ProDS) that scores training samples based on their alignment with preferences observed in the target set. Our key innovation lies in shifting the data selection criteria from merely estimating features for accurate response generation to explicitly aligning training samples with human preferences in target tasks. Specifically, direct preference optimization (DPO) is employed to estimate human preferences across diverse responses. Besides, a bidirectional preference synthesis strategy is designed to score training samples according to both positive preferences and negative preferences. Extensive experimental results demonstrate our superiority to existing task-agnostic and targeted methods.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_12754
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	ProDS: Preference-oriented Data Selection for Instruction Tuning Guo, Wenya Zhang, Zhengkun Liu, Xumeng Zhang, Ying Lu, Ziyu Zhu, Haoze Liu, Xubo Yan, Ruxue Machine Learning Instruction data selection aims to identify a high-quality subset from the training set that matches or exceeds the performance of the full dataset on target tasks. Existing methods focus on the instruction-to-response mapping, but neglect the human preference for diverse responses. In this paper, we propose Preference-oriented Data Selection method (ProDS) that scores training samples based on their alignment with preferences observed in the target set. Our key innovation lies in shifting the data selection criteria from merely estimating features for accurate response generation to explicitly aligning training samples with human preferences in target tasks. Specifically, direct preference optimization (DPO) is employed to estimate human preferences across diverse responses. Besides, a bidirectional preference synthesis strategy is designed to score training samples according to both positive preferences and negative preferences. Extensive experimental results demonstrate our superiority to existing task-agnostic and targeted methods.
title	ProDS: Preference-oriented Data Selection for Instruction Tuning
topic	Machine Learning
url	https://arxiv.org/abs/2505.12754

Similar Items