Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	de Langis, Karin, Koo, Ryan, Kang, Dongyeop
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2402.14146
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909694961909760
author	de Langis, Karin Koo, Ryan Kang, Dongyeop
author_facet	de Langis, Karin Koo, Ryan Kang, Dongyeop
contents	Textual style expresses a diverse set of information, including interpersonal dynamics (e.g., formality) and the author's emotions or attitudes (e.g., disgust). An open question is how language models can be explicitly controlled so that they weave together target styles when generating text: for example, to produce text that is both negative and non-toxic. One approach to such controlled generation is multi-objective reinforcement learning (RL), but how best to combine multiple objectives in a reward function is an open question. In this paper, we investigate various formulations of multi-style rewards, including calibrated outputs from discriminators and dynamic weighting by discriminator gradient magnitudes. We find that our proposed dynamic weighting outperforms static weighting approaches with respect to style control while maintaining linguistic quality, and we explore its effectiveness in 2- and 3-style control.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_14146
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Dynamic Multi-Reward Weighting for Multi-Style Controllable Generation de Langis, Karin Koo, Ryan Kang, Dongyeop Computation and Language Textual style expresses a diverse set of information, including interpersonal dynamics (e.g., formality) and the author's emotions or attitudes (e.g., disgust). An open question is how language models can be explicitly controlled so that they weave together target styles when generating text: for example, to produce text that is both negative and non-toxic. One approach to such controlled generation is multi-objective reinforcement learning (RL), but how best to combine multiple objectives in a reward function is an open question. In this paper, we investigate various formulations of multi-style rewards, including calibrated outputs from discriminators and dynamic weighting by discriminator gradient magnitudes. We find that our proposed dynamic weighting outperforms static weighting approaches with respect to style control while maintaining linguistic quality, and we explore its effectiveness in 2- and 3-style control.
title	Dynamic Multi-Reward Weighting for Multi-Style Controllable Generation
topic	Computation and Language
url	https://arxiv.org/abs/2402.14146

Similar Items