Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Rietz, Finn, Smirnov, Oleg, Karimi, Sara, Cao, Lele
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2502.06358
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915397310087168
author	Rietz, Finn Smirnov, Oleg Karimi, Sara Cao, Lele
author_facet	Rietz, Finn Smirnov, Oleg Karimi, Sara Cao, Lele
contents	Prompting has emerged as the dominant paradigm for adapting large, pre-trained transformer-based models to downstream tasks. The Prompting Decision Transformer (PDT) enables large-scale, multi-task offline Reinforcement Learning (RL) pre-training by leveraging stochastic trajectory prompts to identify the target task. However, these prompts are sampled uniformly from expert demonstrations, overlooking a critical limitation: not all prompts are equally informative for differentiating between tasks. This limits generalization and adaptation, especially in low-data or open-world settings where sample efficiency is crucial. To address this issue, we propose a lightweight, inference-time, bandit-based prompt-tuning framework. The bandit explores and optimizes trajectory prompt selection to enhance task performance, while avoiding costly fine-tuning of the transformer backbone. Our experiments indicate not only clear performance gains due to bandit-based prompt-tuning, but also better sample complexity, scalability, and prompt space exploration compared to prompt-tuning baselines. These results highlights the importance of adaptive prompt selection mechanisms for efficient generalization in offline multi-task RL.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_06358
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Prompt-Tuning Bandits: Enabling Few-Shot Generalization for Efficient Multi-Task Offline RL Rietz, Finn Smirnov, Oleg Karimi, Sara Cao, Lele Machine Learning Prompting has emerged as the dominant paradigm for adapting large, pre-trained transformer-based models to downstream tasks. The Prompting Decision Transformer (PDT) enables large-scale, multi-task offline Reinforcement Learning (RL) pre-training by leveraging stochastic trajectory prompts to identify the target task. However, these prompts are sampled uniformly from expert demonstrations, overlooking a critical limitation: not all prompts are equally informative for differentiating between tasks. This limits generalization and adaptation, especially in low-data or open-world settings where sample efficiency is crucial. To address this issue, we propose a lightweight, inference-time, bandit-based prompt-tuning framework. The bandit explores and optimizes trajectory prompt selection to enhance task performance, while avoiding costly fine-tuning of the transformer backbone. Our experiments indicate not only clear performance gains due to bandit-based prompt-tuning, but also better sample complexity, scalability, and prompt space exploration compared to prompt-tuning baselines. These results highlights the importance of adaptive prompt selection mechanisms for efficient generalization in offline multi-task RL.
title	Prompt-Tuning Bandits: Enabling Few-Shot Generalization for Efficient Multi-Task Offline RL
topic	Machine Learning
url	https://arxiv.org/abs/2502.06358

Similar Items