Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Tianlu, Kulikov, Ilia, Golovneva, Olga, Yu, Ping, Yuan, Weizhe, Dwivedi-Yu, Jane, Pang, Richard Yuanzhe, Fazel-Zarandi, Maryam, Weston, Jason, Li, Xian
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2408.02666
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913461423833088
author	Wang, Tianlu Kulikov, Ilia Golovneva, Olga Yu, Ping Yuan, Weizhe Dwivedi-Yu, Jane Pang, Richard Yuanzhe Fazel-Zarandi, Maryam Weston, Jason Li, Xian
author_facet	Wang, Tianlu Kulikov, Ilia Golovneva, Olga Yu, Ping Yuan, Weizhe Dwivedi-Yu, Jane Pang, Richard Yuanzhe Fazel-Zarandi, Maryam Weston, Jason Li, Xian
contents	Model-based evaluation is at the heart of successful model development -- as a reward model for training, and as a replacement for human evaluation. To train such evaluators, the standard approach is to collect a large amount of human preference judgments over model responses, which is costly and the data becomes stale as models improve. In this work, we present an approach that aims to im-prove evaluators without human annotations, using synthetic training data only. Starting from unlabeled instructions, our iterative self-improvement scheme generates contrasting model outputs and trains an LLM-as-a-Judge to produce reasoning traces and final judgments, repeating this training at each new iteration using the improved predictions. Without any labeled preference data, our Self-Taught Evaluator can improve a strong LLM (Llama3-70B-Instruct) from 75.4 to 88.3 (88.7 with majority vote) on RewardBench. This outperforms commonly used LLM judges such as GPT-4 and matches the performance of the top-performing reward models trained with labeled examples.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_02666
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Self-Taught Evaluators Wang, Tianlu Kulikov, Ilia Golovneva, Olga Yu, Ping Yuan, Weizhe Dwivedi-Yu, Jane Pang, Richard Yuanzhe Fazel-Zarandi, Maryam Weston, Jason Li, Xian Computation and Language Artificial Intelligence Model-based evaluation is at the heart of successful model development -- as a reward model for training, and as a replacement for human evaluation. To train such evaluators, the standard approach is to collect a large amount of human preference judgments over model responses, which is costly and the data becomes stale as models improve. In this work, we present an approach that aims to im-prove evaluators without human annotations, using synthetic training data only. Starting from unlabeled instructions, our iterative self-improvement scheme generates contrasting model outputs and trains an LLM-as-a-Judge to produce reasoning traces and final judgments, repeating this training at each new iteration using the improved predictions. Without any labeled preference data, our Self-Taught Evaluator can improve a strong LLM (Llama3-70B-Instruct) from 75.4 to 88.3 (88.7 with majority vote) on RewardBench. This outperforms commonly used LLM judges such as GPT-4 and matches the performance of the top-performing reward models trained with labeled examples.
title	Self-Taught Evaluators
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2408.02666

Similar Items