Medarbejdervisning: :: Library Catalog

Saved in:

Bibliografiske detaljer
Main Authors:	Tajwar, Fahim, Jiang, Yiding, Thankaraj, Abitha, Rahman, Sumaita Sadia, Kolter, J Zico, Schneider, Jeff, Salakhutdinov, Ruslan
Format:	Preprint
Udgivet:	2025
Fag:	Machine Learning Artificial Intelligence Computation and Language
Online adgang:	https://arxiv.org/abs/2502.17543
Tags:	Tilføj Tag Ingen Tags, Vær først til at tagge denne postø!

_version_	1866911241769844736
author	Tajwar, Fahim Jiang, Yiding Thankaraj, Abitha Rahman, Sumaita Sadia Kolter, J Zico Schneider, Jeff Salakhutdinov, Ruslan
author_facet	Tajwar, Fahim Jiang, Yiding Thankaraj, Abitha Rahman, Sumaita Sadia Kolter, J Zico Schneider, Jeff Salakhutdinov, Ruslan
contents	Efficient exploration is essential for intelligent systems interacting with their environment, but existing language models often fall short in scenarios that require strategic information gathering. In this paper, we present Paprika, a fine-tuning approach that enables language models to develop general decision-making capabilities that are not confined to particular environments. By training on synthetic interaction data from different tasks that require diverse strategies, Paprika teaches models to explore and adapt their behavior on a new task based on environment feedback in-context without more gradient updates. Experimental results show that models fine-tuned with Paprika can effectively transfer their learned decision-making capabilities to entirely unseen tasks without additional training. Unlike traditional training, our approach's primary bottleneck lies in sampling useful interaction data instead of model updates. To improve sample efficiency, we propose a curriculum learning strategy that prioritizes sampling trajectories from tasks with high learning potential. These results suggest a promising path towards AI systems that can autonomously solve novel sequential decision-making problems that require interactions with the external world.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_17543
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Training a Generally Curious Agent Tajwar, Fahim Jiang, Yiding Thankaraj, Abitha Rahman, Sumaita Sadia Kolter, J Zico Schneider, Jeff Salakhutdinov, Ruslan Machine Learning Artificial Intelligence Computation and Language Efficient exploration is essential for intelligent systems interacting with their environment, but existing language models often fall short in scenarios that require strategic information gathering. In this paper, we present Paprika, a fine-tuning approach that enables language models to develop general decision-making capabilities that are not confined to particular environments. By training on synthetic interaction data from different tasks that require diverse strategies, Paprika teaches models to explore and adapt their behavior on a new task based on environment feedback in-context without more gradient updates. Experimental results show that models fine-tuned with Paprika can effectively transfer their learned decision-making capabilities to entirely unseen tasks without additional training. Unlike traditional training, our approach's primary bottleneck lies in sampling useful interaction data instead of model updates. To improve sample efficiency, we propose a curriculum learning strategy that prioritizes sampling trajectories from tasks with high learning potential. These results suggest a promising path towards AI systems that can autonomously solve novel sequential decision-making problems that require interactions with the external world.
title	Training a Generally Curious Agent
topic	Machine Learning Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2502.17543

Lignende værker