Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Cui, Ziyun, Zhang, Ziyang, Sun, Guangzhi, Wu, Wen, Zhang, Chao
Format:	Preprint
Publié:	2024
Sujets:	Computation and Language Artificial Intelligence Machine Learning
Accès en ligne:	https://arxiv.org/abs/2406.03199
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866929754587791360
author	Cui, Ziyun Zhang, Ziyang Sun, Guangzhi Wu, Wen Zhang, Chao
author_facet	Cui, Ziyun Zhang, Ziyang Sun, Guangzhi Wu, Wen Zhang, Chao
contents	Advances in large language models raise the question of how alignment techniques will adapt as models become increasingly complex and humans will only be able to supervise them weakly. Weak-to-Strong mimics such a scenario where weak model supervision attempts to harness the full capabilities of a much stronger model. This work extends Weak-to-Strong to WeakS-to-Strong by exploring an ensemble of weak models which simulate the variability in human opinions. Confidence scores are estimated using a Bayesian approach to guide the WeakS-to-Strong generalization. Furthermore, we extend the application of WeakS-to-Strong from text classification tasks to text generation tasks where more advanced strategies are investigated for supervision. Moreover, direct preference optimization is applied to advance the student model's preference learning, beyond the basic learning framework of teacher forcing. Results demonstrate the effectiveness of the proposed approach for the reliability of a strong student model, showing potential for superalignment.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_03199
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Bayesian WeakS-to-Strong from Text Classification to Generation Cui, Ziyun Zhang, Ziyang Sun, Guangzhi Wu, Wen Zhang, Chao Computation and Language Artificial Intelligence Machine Learning Advances in large language models raise the question of how alignment techniques will adapt as models become increasingly complex and humans will only be able to supervise them weakly. Weak-to-Strong mimics such a scenario where weak model supervision attempts to harness the full capabilities of a much stronger model. This work extends Weak-to-Strong to WeakS-to-Strong by exploring an ensemble of weak models which simulate the variability in human opinions. Confidence scores are estimated using a Bayesian approach to guide the WeakS-to-Strong generalization. Furthermore, we extend the application of WeakS-to-Strong from text classification tasks to text generation tasks where more advanced strategies are investigated for supervision. Moreover, direct preference optimization is applied to advance the student model's preference learning, beyond the basic learning framework of teacher forcing. Results demonstrate the effectiveness of the proposed approach for the reliability of a strong student model, showing potential for superalignment.
title	Bayesian WeakS-to-Strong from Text Classification to Generation
topic	Computation and Language Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2406.03199

Documents similaires