Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Baur, Raphaël, Metz, Yannick, Gkoulta, Maria, El-Assady, Mennatallah, Ramponi, Giorgia, Buening, Thomas Kleine
Formato:	Preprint
Publicado:	2026
Materias:	Machine Learning Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2602.15206
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866908837351522304
author	Baur, Raphaël Metz, Yannick Gkoulta, Maria El-Assady, Mennatallah Ramponi, Giorgia Buening, Thomas Kleine
author_facet	Baur, Raphaël Metz, Yannick Gkoulta, Maria El-Assady, Mennatallah Ramponi, Giorgia Buening, Thomas Kleine
contents	Reward learning typically relies on a single feedback type or combines multiple feedback types using manually weighted loss terms. Currently, it remains unclear how to jointly learn reward functions from heterogeneous feedback types such as demonstrations, comparisons, ratings, and stops that provide qualitatively different signals. We address this challenge by formulating reward learning from multiple feedback types as Bayesian inference over a shared latent reward function, where each feedback type contributes information through an explicit likelihood. We introduce a scalable amortized variational inference approach that learns a shared reward encoder and feedback-specific likelihood decoders and is trained by optimizing a single evidence lower bound. Our approach avoids reducing feedback to a common intermediate representation and eliminates the need for manual loss balancing. Across discrete and continuous-control benchmarks, we show that jointly inferred reward posteriors outperform single-type baselines, exploit complementary information across feedback types, and yield policies that are more robust to environment perturbations. The inferred reward uncertainty further provides interpretable signals for analyzing model confidence and consistency across feedback types.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_15206
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	MAVRL: Learning Reward Functions from Multiple Feedback Types with Amortized Variational Inference Baur, Raphaël Metz, Yannick Gkoulta, Maria El-Assady, Mennatallah Ramponi, Giorgia Buening, Thomas Kleine Machine Learning Artificial Intelligence Reward learning typically relies on a single feedback type or combines multiple feedback types using manually weighted loss terms. Currently, it remains unclear how to jointly learn reward functions from heterogeneous feedback types such as demonstrations, comparisons, ratings, and stops that provide qualitatively different signals. We address this challenge by formulating reward learning from multiple feedback types as Bayesian inference over a shared latent reward function, where each feedback type contributes information through an explicit likelihood. We introduce a scalable amortized variational inference approach that learns a shared reward encoder and feedback-specific likelihood decoders and is trained by optimizing a single evidence lower bound. Our approach avoids reducing feedback to a common intermediate representation and eliminates the need for manual loss balancing. Across discrete and continuous-control benchmarks, we show that jointly inferred reward posteriors outperform single-type baselines, exploit complementary information across feedback types, and yield policies that are more robust to environment perturbations. The inferred reward uncertainty further provides interpretable signals for analyzing model confidence and consistency across feedback types.
title	MAVRL: Learning Reward Functions from Multiple Feedback Types with Amortized Variational Inference
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2602.15206

Ejemplares similares