Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Shi, Haochen, Sun, Zhiyuan, Yuan, Xingdi, Côté, Marc-Alexandre, Liu, Bang
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2403.03017
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929264941596672
author	Shi, Haochen Sun, Zhiyuan Yuan, Xingdi Côté, Marc-Alexandre Liu, Bang
author_facet	Shi, Haochen Sun, Zhiyuan Yuan, Xingdi Côté, Marc-Alexandre Liu, Bang
contents	Embodied Instruction Following (EIF) is a crucial task in embodied learning, requiring agents to interact with their environment through egocentric observations to fulfill natural language instructions. Recent advancements have seen a surge in employing large language models (LLMs) within a framework-centric approach to enhance performance in embodied learning tasks, including EIF. Despite these efforts, there exists a lack of a unified understanding regarding the impact of various components-ranging from visual perception to action execution-on task performance. To address this gap, we introduce OPEx, a comprehensive framework that delineates the core components essential for solving embodied learning tasks: Observer, Planner, and Executor. Through extensive evaluations, we provide a deep analysis of how each component influences EIF task performance. Furthermore, we innovate within this space by deploying a multi-agent dialogue strategy on a TextWorld counterpart, further enhancing task performance. Our findings reveal that LLM-centric design markedly improves EIF outcomes, identify visual perception and low-level action execution as critical bottlenecks, and demonstrate that augmenting LLMs with a multi-agent framework further elevates performance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_03017
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following Shi, Haochen Sun, Zhiyuan Yuan, Xingdi Côté, Marc-Alexandre Liu, Bang Artificial Intelligence Embodied Instruction Following (EIF) is a crucial task in embodied learning, requiring agents to interact with their environment through egocentric observations to fulfill natural language instructions. Recent advancements have seen a surge in employing large language models (LLMs) within a framework-centric approach to enhance performance in embodied learning tasks, including EIF. Despite these efforts, there exists a lack of a unified understanding regarding the impact of various components-ranging from visual perception to action execution-on task performance. To address this gap, we introduce OPEx, a comprehensive framework that delineates the core components essential for solving embodied learning tasks: Observer, Planner, and Executor. Through extensive evaluations, we provide a deep analysis of how each component influences EIF task performance. Furthermore, we innovate within this space by deploying a multi-agent dialogue strategy on a TextWorld counterpart, further enhancing task performance. Our findings reveal that LLM-centric design markedly improves EIF outcomes, identify visual perception and low-level action execution as critical bottlenecks, and demonstrate that augmenting LLMs with a multi-agent framework further elevates performance.
title	OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following
topic	Artificial Intelligence
url	https://arxiv.org/abs/2403.03017

Similar Items