Saved in:
Bibliographic Details
Main Authors: Guo, Wenya, Zhang, Zhengkun, Liu, Xumeng, Zhang, Ying, Lu, Ziyu, Zhu, Haoze, Liu, Xubo, Yan, Ruxue
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2505.12754
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913846565797888
author Guo, Wenya
Zhang, Zhengkun
Liu, Xumeng
Zhang, Ying
Lu, Ziyu
Zhu, Haoze
Liu, Xubo
Yan, Ruxue
author_facet Guo, Wenya
Zhang, Zhengkun
Liu, Xumeng
Zhang, Ying
Lu, Ziyu
Zhu, Haoze
Liu, Xubo
Yan, Ruxue
contents Instruction data selection aims to identify a high-quality subset from the training set that matches or exceeds the performance of the full dataset on target tasks. Existing methods focus on the instruction-to-response mapping, but neglect the human preference for diverse responses. In this paper, we propose Preference-oriented Data Selection method (ProDS) that scores training samples based on their alignment with preferences observed in the target set. Our key innovation lies in shifting the data selection criteria from merely estimating features for accurate response generation to explicitly aligning training samples with human preferences in target tasks. Specifically, direct preference optimization (DPO) is employed to estimate human preferences across diverse responses. Besides, a bidirectional preference synthesis strategy is designed to score training samples according to both positive preferences and negative preferences. Extensive experimental results demonstrate our superiority to existing task-agnostic and targeted methods.
format Preprint
id arxiv_https___arxiv_org_abs_2505_12754
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle ProDS: Preference-oriented Data Selection for Instruction Tuning
Guo, Wenya
Zhang, Zhengkun
Liu, Xumeng
Zhang, Ying
Lu, Ziyu
Zhu, Haoze
Liu, Xubo
Yan, Ruxue
Machine Learning
Instruction data selection aims to identify a high-quality subset from the training set that matches or exceeds the performance of the full dataset on target tasks. Existing methods focus on the instruction-to-response mapping, but neglect the human preference for diverse responses. In this paper, we propose Preference-oriented Data Selection method (ProDS) that scores training samples based on their alignment with preferences observed in the target set. Our key innovation lies in shifting the data selection criteria from merely estimating features for accurate response generation to explicitly aligning training samples with human preferences in target tasks. Specifically, direct preference optimization (DPO) is employed to estimate human preferences across diverse responses. Besides, a bidirectional preference synthesis strategy is designed to score training samples according to both positive preferences and negative preferences. Extensive experimental results demonstrate our superiority to existing task-agnostic and targeted methods.
title ProDS: Preference-oriented Data Selection for Instruction Tuning
topic Machine Learning
url https://arxiv.org/abs/2505.12754