Saved in:
Bibliografiske detaljer
Main Authors: Tajwar, Fahim, Jiang, Yiding, Thankaraj, Abitha, Rahman, Sumaita Sadia, Kolter, J Zico, Schneider, Jeff, Salakhutdinov, Ruslan
Format: Preprint
Udgivet: 2025
Fag:
Online adgang:https://arxiv.org/abs/2502.17543
Tags: Tilføj Tag
Ingen Tags, Vær først til at tagge denne postø!
_version_ 1866911241769844736
author Tajwar, Fahim
Jiang, Yiding
Thankaraj, Abitha
Rahman, Sumaita Sadia
Kolter, J Zico
Schneider, Jeff
Salakhutdinov, Ruslan
author_facet Tajwar, Fahim
Jiang, Yiding
Thankaraj, Abitha
Rahman, Sumaita Sadia
Kolter, J Zico
Schneider, Jeff
Salakhutdinov, Ruslan
contents Efficient exploration is essential for intelligent systems interacting with their environment, but existing language models often fall short in scenarios that require strategic information gathering. In this paper, we present Paprika, a fine-tuning approach that enables language models to develop general decision-making capabilities that are not confined to particular environments. By training on synthetic interaction data from different tasks that require diverse strategies, Paprika teaches models to explore and adapt their behavior on a new task based on environment feedback in-context without more gradient updates. Experimental results show that models fine-tuned with Paprika can effectively transfer their learned decision-making capabilities to entirely unseen tasks without additional training. Unlike traditional training, our approach's primary bottleneck lies in sampling useful interaction data instead of model updates. To improve sample efficiency, we propose a curriculum learning strategy that prioritizes sampling trajectories from tasks with high learning potential. These results suggest a promising path towards AI systems that can autonomously solve novel sequential decision-making problems that require interactions with the external world.
format Preprint
id arxiv_https___arxiv_org_abs_2502_17543
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Training a Generally Curious Agent
Tajwar, Fahim
Jiang, Yiding
Thankaraj, Abitha
Rahman, Sumaita Sadia
Kolter, J Zico
Schneider, Jeff
Salakhutdinov, Ruslan
Machine Learning
Artificial Intelligence
Computation and Language
Efficient exploration is essential for intelligent systems interacting with their environment, but existing language models often fall short in scenarios that require strategic information gathering. In this paper, we present Paprika, a fine-tuning approach that enables language models to develop general decision-making capabilities that are not confined to particular environments. By training on synthetic interaction data from different tasks that require diverse strategies, Paprika teaches models to explore and adapt their behavior on a new task based on environment feedback in-context without more gradient updates. Experimental results show that models fine-tuned with Paprika can effectively transfer their learned decision-making capabilities to entirely unseen tasks without additional training. Unlike traditional training, our approach's primary bottleneck lies in sampling useful interaction data instead of model updates. To improve sample efficiency, we propose a curriculum learning strategy that prioritizes sampling trajectories from tasks with high learning potential. These results suggest a promising path towards AI systems that can autonomously solve novel sequential decision-making problems that require interactions with the external world.
title Training a Generally Curious Agent
topic Machine Learning
Artificial Intelligence
Computation and Language
url https://arxiv.org/abs/2502.17543