Saved in:
Bibliographic Details
Main Authors: Zhang, Honggen, Zhao, Xufeng, Molybog, Igor, Zhang, June
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2409.17169
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913873504763904
author Zhang, Honggen
Zhao, Xufeng
Molybog, Igor
Zhang, June
author_facet Zhang, Honggen
Zhao, Xufeng
Molybog, Igor
Zhang, June
contents Aligning large language models (LLMs) to human preferences is a crucial step in building helpful and safe AI tools, which usually involve training on supervised datasets. Popular algorithms such as Direct Preference Optimization (DPO) rely on pairs of AI-generated responses ranked according to human annotation. The response pair annotation process might bring human bias. Building a correct preference dataset is the costly part of the alignment pipeline. To improve annotation efficiency and quality in the LLMs alignment, we propose REAL: Response Embedding-based Alignment for LLMs, a strategy for constructing a high-quality training dataset that focuses on acquiring the less ambiguous preference pairs for labeling out of a set of response candidates. Our selection process is based on the similarity of embedding responses independently of prompts, which guarantees the selection process in an off-policy setting, avoiding adaptively measuring the similarity during the training. Experimental results on real-world dataset SHP2 and synthetic HH-RLHF benchmarks indicate that choosing dissimilar response pairs enhances the direct alignment of LLMs while reducing inherited labeling errors. The model aligned with dissimilar response pairs obtained a better margin and win rate on the dialogue task. Our findings suggest that focusing on distinct pairs can reduce the label error and improve LLM alignment efficiency, saving up to $65\%$ of annotators' work.
format Preprint
id arxiv_https___arxiv_org_abs_2409_17169
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle REAL: Response Embedding-based Alignment for LLMs
Zhang, Honggen
Zhao, Xufeng
Molybog, Igor
Zhang, June
Computation and Language
Artificial Intelligence
Aligning large language models (LLMs) to human preferences is a crucial step in building helpful and safe AI tools, which usually involve training on supervised datasets. Popular algorithms such as Direct Preference Optimization (DPO) rely on pairs of AI-generated responses ranked according to human annotation. The response pair annotation process might bring human bias. Building a correct preference dataset is the costly part of the alignment pipeline. To improve annotation efficiency and quality in the LLMs alignment, we propose REAL: Response Embedding-based Alignment for LLMs, a strategy for constructing a high-quality training dataset that focuses on acquiring the less ambiguous preference pairs for labeling out of a set of response candidates. Our selection process is based on the similarity of embedding responses independently of prompts, which guarantees the selection process in an off-policy setting, avoiding adaptively measuring the similarity during the training. Experimental results on real-world dataset SHP2 and synthetic HH-RLHF benchmarks indicate that choosing dissimilar response pairs enhances the direct alignment of LLMs while reducing inherited labeling errors. The model aligned with dissimilar response pairs obtained a better margin and win rate on the dialogue task. Our findings suggest that focusing on distinct pairs can reduce the label error and improve LLM alignment efficiency, saving up to $65\%$ of annotators' work.
title REAL: Response Embedding-based Alignment for LLMs
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2409.17169