Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.12754 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866913846565797888 |
|---|---|
| author | Guo, Wenya Zhang, Zhengkun Liu, Xumeng Zhang, Ying Lu, Ziyu Zhu, Haoze Liu, Xubo Yan, Ruxue |
| author_facet | Guo, Wenya Zhang, Zhengkun Liu, Xumeng Zhang, Ying Lu, Ziyu Zhu, Haoze Liu, Xubo Yan, Ruxue |
| contents | Instruction data selection aims to identify a high-quality subset from the training set that matches or exceeds the performance of the full dataset on target tasks. Existing methods focus on the instruction-to-response mapping, but neglect the human preference for diverse responses. In this paper, we propose Preference-oriented Data Selection method (ProDS) that scores training samples based on their alignment with preferences observed in the target set. Our key innovation lies in shifting the data selection criteria from merely estimating features for accurate response generation to explicitly aligning training samples with human preferences in target tasks. Specifically, direct preference optimization (DPO) is employed to estimate human preferences across diverse responses. Besides, a bidirectional preference synthesis strategy is designed to score training samples according to both positive preferences and negative preferences. Extensive experimental results demonstrate our superiority to existing task-agnostic and targeted methods. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2505_12754 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | ProDS: Preference-oriented Data Selection for Instruction Tuning Guo, Wenya Zhang, Zhengkun Liu, Xumeng Zhang, Ying Lu, Ziyu Zhu, Haoze Liu, Xubo Yan, Ruxue Machine Learning Instruction data selection aims to identify a high-quality subset from the training set that matches or exceeds the performance of the full dataset on target tasks. Existing methods focus on the instruction-to-response mapping, but neglect the human preference for diverse responses. In this paper, we propose Preference-oriented Data Selection method (ProDS) that scores training samples based on their alignment with preferences observed in the target set. Our key innovation lies in shifting the data selection criteria from merely estimating features for accurate response generation to explicitly aligning training samples with human preferences in target tasks. Specifically, direct preference optimization (DPO) is employed to estimate human preferences across diverse responses. Besides, a bidirectional preference synthesis strategy is designed to score training samples according to both positive preferences and negative preferences. Extensive experimental results demonstrate our superiority to existing task-agnostic and targeted methods. |
| title | ProDS: Preference-oriented Data Selection for Instruction Tuning |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2505.12754 |