Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ranaldi, Leonardo, Freitas, Andrè
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2405.00402
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913666128936960
author	Ranaldi, Leonardo Freitas, Andrè
author_facet	Ranaldi, Leonardo Freitas, Andrè
contents	The alignments of reasoning abilities between smaller and larger Language Models are largely conducted via Supervised Fine-Tuning (SFT) using demonstrations generated from robust Large Language Models (LLMs). Although these approaches deliver more performant models, they do not show sufficiently strong generalization ability as the training only relies on the provided demonstrations. In this paper, we propose the Self-refine Instruction-tuning method that elicits Smaller Language Models to self-refine their abilities. Our approach is based on a two-stage process, where reasoning abilities are first transferred between LLMs and Small Language Models (SLMs) via Instruction-tuning on demonstrations provided by LLMs, and then the instructed models Self-refine their abilities through preference optimization strategies. In particular, the second phase operates refinement heuristics based on the Direct Preference Optimization algorithm, where the SLMs are elicited to deliver a series of reasoning paths by automatically sampling the generated responses and providing rewards using ground truths from the LLMs. Results obtained on commonsense and math reasoning tasks show that this approach significantly outperforms Instruction-tuning in both in-domain and out-domain scenarios, aligning the reasoning abilities of Smaller and Larger Language Models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_00402
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models Ranaldi, Leonardo Freitas, Andrè Computation and Language The alignments of reasoning abilities between smaller and larger Language Models are largely conducted via Supervised Fine-Tuning (SFT) using demonstrations generated from robust Large Language Models (LLMs). Although these approaches deliver more performant models, they do not show sufficiently strong generalization ability as the training only relies on the provided demonstrations. In this paper, we propose the Self-refine Instruction-tuning method that elicits Smaller Language Models to self-refine their abilities. Our approach is based on a two-stage process, where reasoning abilities are first transferred between LLMs and Small Language Models (SLMs) via Instruction-tuning on demonstrations provided by LLMs, and then the instructed models Self-refine their abilities through preference optimization strategies. In particular, the second phase operates refinement heuristics based on the Direct Preference Optimization algorithm, where the SLMs are elicited to deliver a series of reasoning paths by automatically sampling the generated responses and providing rewards using ground truths from the LLMs. Results obtained on commonsense and math reasoning tasks show that this approach significantly outperforms Instruction-tuning in both in-domain and out-domain scenarios, aligning the reasoning abilities of Smaller and Larger Language Models.
title	Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models
topic	Computation and Language
url	https://arxiv.org/abs/2405.00402

Similar Items