Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Maura-Rivero, Roberto-Rafael, Nagpal, Chirag, Patel, Roma, Visin, Francesco
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language General Economics Economics
Online Access:	https://arxiv.org/abs/2501.06248
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916629686779904
author	Maura-Rivero, Roberto-Rafael Nagpal, Chirag Patel, Roma Visin, Francesco
author_facet	Maura-Rivero, Roberto-Rafael Nagpal, Chirag Patel, Roma Visin, Francesco
contents	Current methods that train large language models (LLMs) with reinforcement learning feedback, often resort to averaging outputs of multiple rewards functions during training. This overlooks crucial aspects of individual reward dimensions and inter-reward dependencies that can lead to sub-optimal outcomes in generations. In this work, we show how linear aggregation of rewards exhibits some vulnerabilities that can lead to undesired properties of generated text. We then propose a transformation of reward functions inspired by economic theory of utility functions (specifically Inada conditions), that enhances sensitivity to low reward values while diminishing sensitivity to already high values. We compare our approach to the existing baseline methods that linearly aggregate rewards and show how the Inada-inspired reward feedback is superior to traditional weighted averaging. We quantitatively and qualitatively analyse the difference in the methods, and see that models trained with Inada-transformations score as more helpful while being less harmful.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_06248
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models Maura-Rivero, Roberto-Rafael Nagpal, Chirag Patel, Roma Visin, Francesco Machine Learning Artificial Intelligence Computation and Language General Economics Economics Current methods that train large language models (LLMs) with reinforcement learning feedback, often resort to averaging outputs of multiple rewards functions during training. This overlooks crucial aspects of individual reward dimensions and inter-reward dependencies that can lead to sub-optimal outcomes in generations. In this work, we show how linear aggregation of rewards exhibits some vulnerabilities that can lead to undesired properties of generated text. We then propose a transformation of reward functions inspired by economic theory of utility functions (specifically Inada conditions), that enhances sensitivity to low reward values while diminishing sensitivity to already high values. We compare our approach to the existing baseline methods that linearly aggregate rewards and show how the Inada-inspired reward feedback is superior to traditional weighted averaging. We quantitatively and qualitatively analyse the difference in the methods, and see that models trained with Inada-transformations score as more helpful while being less harmful.
title	Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models
topic	Machine Learning Artificial Intelligence Computation and Language General Economics Economics
url	https://arxiv.org/abs/2501.06248

Similar Items