Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Guo, Geyang, Zhao, Ranchi, Tang, Tianyi, Zhao, Wayne Xin, Wen, Ji-Rong
Format:	Preprint
Veröffentlicht:	2023
Schlagworte:	Computation and Language
Online-Zugang:	https://arxiv.org/abs/2311.04072
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866929314078916608
author	Guo, Geyang Zhao, Ranchi Tang, Tianyi Zhao, Wayne Xin Wen, Ji-Rong
author_facet	Guo, Geyang Zhao, Ranchi Tang, Tianyi Zhao, Wayne Xin Wen, Ji-Rong
contents	Alignment with human preference is a desired property of large language models (LLMs). Currently, the main alignment approach is based on reinforcement learning from human feedback (RLHF). Despite the effectiveness of RLHF, it is intricate to implement and train, thus recent studies explore how to develop alternative alignment approaches based on supervised fine-tuning (SFT). A major limitation of SFT is that it essentially does imitation learning, which cannot fully understand what are the expected behaviors. To address this issue, we propose an improved alignment approach named FIGA. Different from prior methods, we incorporate fine-grained (i.e., token or phrase level) quality signals that are derived by contrasting good and bad responses. Our approach has made two major contributions. Firstly, we curate a refined alignment dataset that pairs initial responses and the corresponding revised ones. Secondly, we devise a new loss function can leverage fine-grained quality signals to instruct the learning of LLMs for alignment. Extensive experiments have demonstrated the effectiveness of our approaches by comparing a number of competitive baselines.
format	Preprint
id	arxiv_https___arxiv_org_abs_2311_04072
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment Guo, Geyang Zhao, Ranchi Tang, Tianyi Zhao, Wayne Xin Wen, Ji-Rong Computation and Language Alignment with human preference is a desired property of large language models (LLMs). Currently, the main alignment approach is based on reinforcement learning from human feedback (RLHF). Despite the effectiveness of RLHF, it is intricate to implement and train, thus recent studies explore how to develop alternative alignment approaches based on supervised fine-tuning (SFT). A major limitation of SFT is that it essentially does imitation learning, which cannot fully understand what are the expected behaviors. To address this issue, we propose an improved alignment approach named FIGA. Different from prior methods, we incorporate fine-grained (i.e., token or phrase level) quality signals that are derived by contrasting good and bad responses. Our approach has made two major contributions. Firstly, we curate a refined alignment dataset that pairs initial responses and the corresponding revised ones. Secondly, we devise a new loss function can leverage fine-grained quality signals to instruct the learning of LLMs for alignment. Extensive experiments have demonstrated the effectiveness of our approaches by comparing a number of competitive baselines.
title	Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment
topic	Computation and Language
url	https://arxiv.org/abs/2311.04072

Ähnliche Einträge