Saved in:
Bibliographic Details
Main Authors: Scarpellini, Gianluca, Konyushkova, Ksenia, Fantacci, Claudio, Paine, Tom Le, Chen, Yutian, Denil, Misha
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2306.09800
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • This paper describes $\pi2\text{vec}$, a method for representing behaviors of black box policies as feature vectors. The policy representations capture how the statistics of foundation model features change in response to the policy behavior in a task agnostic way, and can be trained from offline data, allowing them to be used in offline policy selection. This work provides a key piece of a recipe for fusing together three modern lines of research: Offline policy evaluation as a counterpart to offline RL, foundation models as generic and powerful state representations, and efficient policy selection in resource constrained environments.