Saved in:
Bibliographic Details
Main Authors: Wang, Kuan, Lu, Yadong, Santacroce, Michael, Gong, Yeyun, Zhang, Chao, Shen, Yelong
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2310.01444
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911838129618944
author Wang, Kuan
Lu, Yadong
Santacroce, Michael
Gong, Yeyun
Zhang, Chao
Shen, Yelong
author_facet Wang, Kuan
Lu, Yadong
Santacroce, Michael
Gong, Yeyun
Zhang, Chao
Shen, Yelong
contents Recent advances in large language models (LLMs) have demonstrated potential for LLM agents. To facilitate the training for these agents with both linguistic feedback and non-linguistic reward signals, we introduce Learning through Communication (LTC). We design a universal buffer to store all the feedback, and an iterative pipeline to enable an LLM agent to explore and update its policy in an given environment. To optimize agent interactions for task-specific learning with our universal buffer and pipeline, we introduce diverse communication patterns tailored for both single-agent and multi-agent environments. We evaluate the efficacy of our LTC approach on four diverse datasets: ALFWorld (single-agent), HotpotQA (multi-agent collaboration), Chameleon (multi-agent competition), and GSM8k (multi-agent teacher-student). On these data sets, LTC outperforms the supervised instruction fine-tuning baselines by 3.6% to 12%. These results highlight the versatility and efficiency of LTC in facilitating online adaptation for LLM agents.
format Preprint
id arxiv_https___arxiv_org_abs_2310_01444
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Adapting LLM Agents with Universal Feedback in Communication
Wang, Kuan
Lu, Yadong
Santacroce, Michael
Gong, Yeyun
Zhang, Chao
Shen, Yelong
Computation and Language
Artificial Intelligence
Recent advances in large language models (LLMs) have demonstrated potential for LLM agents. To facilitate the training for these agents with both linguistic feedback and non-linguistic reward signals, we introduce Learning through Communication (LTC). We design a universal buffer to store all the feedback, and an iterative pipeline to enable an LLM agent to explore and update its policy in an given environment. To optimize agent interactions for task-specific learning with our universal buffer and pipeline, we introduce diverse communication patterns tailored for both single-agent and multi-agent environments. We evaluate the efficacy of our LTC approach on four diverse datasets: ALFWorld (single-agent), HotpotQA (multi-agent collaboration), Chameleon (multi-agent competition), and GSM8k (multi-agent teacher-student). On these data sets, LTC outperforms the supervised instruction fine-tuning baselines by 3.6% to 12%. These results highlight the versatility and efficiency of LTC in facilitating online adaptation for LLM agents.
title Adapting LLM Agents with Universal Feedback in Communication
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2310.01444