Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.03610 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866910319100559360 |
|---|---|
| author | Kagaya, Tomoyuki Yuan, Thong Jing Lou, Yuxuan Karlekar, Jayashree Pranata, Sugiri Kinose, Akira Oguri, Koki Wick, Felix You, Yang |
| author_facet | Kagaya, Tomoyuki Yuan, Thong Jing Lou, Yuxuan Karlekar, Jayashree Pranata, Sugiri Kinose, Akira Oguri, Koki Wick, Felix You, Yang |
| contents | Owing to recent advancements, Large Language Models (LLMs) can now be deployed as agents for increasingly complex decision-making applications in areas including robotics, gaming, and API integration. However, reflecting past experiences in current decision-making processes, an innate human behavior, continues to pose significant challenges. Addressing this, we propose Retrieval-Augmented Planning (RAP) framework, designed to dynamically leverage past experiences corresponding to the current situation and context, thereby enhancing agents' planning capabilities. RAP distinguishes itself by being versatile: it excels in both text-only and multimodal environments, making it suitable for a wide range of tasks. Empirical evaluations demonstrate RAP's effectiveness, where it achieves SOTA performance in textual scenarios and notably enhances multimodal LLM agents' performance for embodied tasks. These results highlight RAP's potential in advancing the functionality and applicability of LLM agents in complex, real-world applications. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2402_03610 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents Kagaya, Tomoyuki Yuan, Thong Jing Lou, Yuxuan Karlekar, Jayashree Pranata, Sugiri Kinose, Akira Oguri, Koki Wick, Felix You, Yang Machine Learning Artificial Intelligence Computation and Language Owing to recent advancements, Large Language Models (LLMs) can now be deployed as agents for increasingly complex decision-making applications in areas including robotics, gaming, and API integration. However, reflecting past experiences in current decision-making processes, an innate human behavior, continues to pose significant challenges. Addressing this, we propose Retrieval-Augmented Planning (RAP) framework, designed to dynamically leverage past experiences corresponding to the current situation and context, thereby enhancing agents' planning capabilities. RAP distinguishes itself by being versatile: it excels in both text-only and multimodal environments, making it suitable for a wide range of tasks. Empirical evaluations demonstrate RAP's effectiveness, where it achieves SOTA performance in textual scenarios and notably enhances multimodal LLM agents' performance for embodied tasks. These results highlight RAP's potential in advancing the functionality and applicability of LLM agents in complex, real-world applications. |
| title | RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents |
| topic | Machine Learning Artificial Intelligence Computation and Language |
| url | https://arxiv.org/abs/2402.03610 |