Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Rahman, Arrasy, Cui, Jiaxun, Stone, Peter
Format:	Preprint
Published:	2023
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2308.09595
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914627731849216
author	Rahman, Arrasy Cui, Jiaxun Stone, Peter
author_facet	Rahman, Arrasy Cui, Jiaxun Stone, Peter
contents	Robustly cooperating with unseen agents and human partners presents significant challenges due to the diverse cooperative conventions these partners may adopt. Existing Ad Hoc Teamwork (AHT) methods address this challenge by training an agent with a population of diverse teammate policies obtained through maximizing specific diversity metrics. However, prior heuristic-based diversity metrics do not always maximize the agent's robustness in all cooperative problems. In this work, we first propose that maximizing an AHT agent's robustness requires it to emulate policies in the minimum coverage set (MCS), the set of best-response policies to any partner policies in the environment. We then introduce the L-BRDiv algorithm that generates a set of teammate policies that, when used for AHT training, encourage agents to emulate policies from the MCS. L-BRDiv works by solving a constrained optimization problem to jointly train teammate policies for AHT training and approximating AHT agent policies that are members of the MCS. We empirically demonstrate that L-BRDiv produces more robust AHT agents than state-of-the-art methods in a broader range of two-player cooperative problems without the need for extensive hyperparameter tuning for its objectives. Our study shows that L-BRDiv outperforms the baseline methods by prioritizing discovering distinct members of the MCS instead of repeatedly finding redundant policies.
format	Preprint
id	arxiv_https___arxiv_org_abs_2308_09595
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents Rahman, Arrasy Cui, Jiaxun Stone, Peter Artificial Intelligence Robustly cooperating with unseen agents and human partners presents significant challenges due to the diverse cooperative conventions these partners may adopt. Existing Ad Hoc Teamwork (AHT) methods address this challenge by training an agent with a population of diverse teammate policies obtained through maximizing specific diversity metrics. However, prior heuristic-based diversity metrics do not always maximize the agent's robustness in all cooperative problems. In this work, we first propose that maximizing an AHT agent's robustness requires it to emulate policies in the minimum coverage set (MCS), the set of best-response policies to any partner policies in the environment. We then introduce the L-BRDiv algorithm that generates a set of teammate policies that, when used for AHT training, encourage agents to emulate policies from the MCS. L-BRDiv works by solving a constrained optimization problem to jointly train teammate policies for AHT training and approximating AHT agent policies that are members of the MCS. We empirically demonstrate that L-BRDiv produces more robust AHT agents than state-of-the-art methods in a broader range of two-player cooperative problems without the need for extensive hyperparameter tuning for its objectives. Our study shows that L-BRDiv outperforms the baseline methods by prioritizing discovering distinct members of the MCS instead of repeatedly finding redundant policies.
title	Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents
topic	Artificial Intelligence
url	https://arxiv.org/abs/2308.09595

Similar Items