Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Le, Long Tan, Shu, Han, Nguyen, Tung-Anh, Hong, Choong Seon, Tran, Nguyen H.
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2405.15230
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914994163023872
author	Le, Long Tan Shu, Han Nguyen, Tung-Anh Hong, Choong Seon Tran, Nguyen H.
author_facet	Le, Long Tan Shu, Han Nguyen, Tung-Anh Hong, Choong Seon Tran, Nguyen H.
contents	While astonishingly capable, large Language Models (LLM) can sometimes produce outputs that deviate from human expectations. Such deviations necessitate an alignment phase to prevent disseminating untruthful, toxic, or biased information. Traditional alignment methods based on reinforcement learning often struggle with the identified instability, whereas preference optimization methods are limited by their overfitting to pre-collected hard-label datasets. In this paper, we propose a novel LLM alignment framework named $i$REPO, which utilizes implicit Reward pairwise difference regression for Empirical Preference Optimization. Particularly, $i$REPO employs self-generated datasets labeled by empirical human (or AI annotator) preference to iteratively refine the aligned policy through a novel regression-based loss function. Furthermore, we introduce an innovative algorithm backed by theoretical guarantees for achieving optimal results under ideal assumptions and providing a practical performance-gap result without such assumptions. Experimental results with Phi-2 and Mistral-7B demonstrate that $i$REPO effectively achieves self-alignment using soft-label, self-generated responses and the logit of empirical AI annotators. Furthermore, our approach surpasses preference optimization baselines in evaluations using the Language Model Evaluation Harness and Multi-turn benchmarks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_15230
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	$i$REPO: $i$mplicit Reward Pairwise Difference based Empirical Preference Optimization Le, Long Tan Shu, Han Nguyen, Tung-Anh Hong, Choong Seon Tran, Nguyen H. Artificial Intelligence Machine Learning While astonishingly capable, large Language Models (LLM) can sometimes produce outputs that deviate from human expectations. Such deviations necessitate an alignment phase to prevent disseminating untruthful, toxic, or biased information. Traditional alignment methods based on reinforcement learning often struggle with the identified instability, whereas preference optimization methods are limited by their overfitting to pre-collected hard-label datasets. In this paper, we propose a novel LLM alignment framework named $i$REPO, which utilizes implicit Reward pairwise difference regression for Empirical Preference Optimization. Particularly, $i$REPO employs self-generated datasets labeled by empirical human (or AI annotator) preference to iteratively refine the aligned policy through a novel regression-based loss function. Furthermore, we introduce an innovative algorithm backed by theoretical guarantees for achieving optimal results under ideal assumptions and providing a practical performance-gap result without such assumptions. Experimental results with Phi-2 and Mistral-7B demonstrate that $i$REPO effectively achieves self-alignment using soft-label, self-generated responses and the logit of empirical AI annotators. Furthermore, our approach surpasses preference optimization baselines in evaluations using the Language Model Evaluation Harness and Multi-turn benchmarks.
title	$i$REPO: $i$mplicit Reward Pairwise Difference based Empirical Preference Optimization
topic	Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2405.15230

Similar Items