Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wei, Bingqing, Xia, Zhongyu, Liu, Dingai, Zhou, Xiaoyu, Lin, Zhiwei, Wang, Yongtao
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.24018
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914420533231616
author	Wei, Bingqing Xia, Zhongyu Liu, Dingai Zhou, Xiaoyu Lin, Zhiwei Wang, Yongtao
author_facet	Wei, Bingqing Xia, Zhongyu Liu, Dingai Zhou, Xiaoyu Lin, Zhiwei Wang, Yongtao
contents	Vision-language models (VLMs) have shown remarkable general capabilities, yet embodied agents built on them fail at complex tasks, often skipping critical steps, proposing invalid actions, and repeating mistakes. These failures arise from a fundamental gap between the static training data of VLMs and the physical interaction for embodied tasks. VLMs can learn rich semantic knowledge from static data but lack the ability to interact with the world. To address this issue, we introduce ELITE, an embodied agent framework with {E}xperiential {L}earning and {I}ntent-aware {T}ransfer that enables agents to continuously learn from their own environment interaction experiences, and transfer acquired knowledge to procedurally similar tasks. ELITE operates through two synergistic mechanisms, \textit{i.e.,} self-reflective knowledge construction and intent-aware retrieval. Specifically, self-reflective knowledge construction extracts reusable strategies from execution trajectories and maintains an evolving strategy pool through structured refinement operations. Then, intent-aware retrieval identifies relevant strategies from the pool and applies them to current tasks. Experiments on the EB-ALFRED and EB-Habitat benchmarks show that ELITE achieves 9\% and 5\% performance improvement over base VLMs in the online setting without any supervision. In the supervised setting, ELITE generalizes effectively to unseen task categories, achieving better performance compared to state-of-the-art training-based methods. These results demonstrate the effectiveness of ELITE for bridging the gap between semantic understanding and reliable action execution.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_24018
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	ELITE: Experiential Learning and Intent-Aware Transfer for Self-improving Embodied Agents Wei, Bingqing Xia, Zhongyu Liu, Dingai Zhou, Xiaoyu Lin, Zhiwei Wang, Yongtao Artificial Intelligence Vision-language models (VLMs) have shown remarkable general capabilities, yet embodied agents built on them fail at complex tasks, often skipping critical steps, proposing invalid actions, and repeating mistakes. These failures arise from a fundamental gap between the static training data of VLMs and the physical interaction for embodied tasks. VLMs can learn rich semantic knowledge from static data but lack the ability to interact with the world. To address this issue, we introduce ELITE, an embodied agent framework with {E}xperiential {L}earning and {I}ntent-aware {T}ransfer that enables agents to continuously learn from their own environment interaction experiences, and transfer acquired knowledge to procedurally similar tasks. ELITE operates through two synergistic mechanisms, \textit{i.e.,} self-reflective knowledge construction and intent-aware retrieval. Specifically, self-reflective knowledge construction extracts reusable strategies from execution trajectories and maintains an evolving strategy pool through structured refinement operations. Then, intent-aware retrieval identifies relevant strategies from the pool and applies them to current tasks. Experiments on the EB-ALFRED and EB-Habitat benchmarks show that ELITE achieves 9\% and 5\% performance improvement over base VLMs in the online setting without any supervision. In the supervised setting, ELITE generalizes effectively to unseen task categories, achieving better performance compared to state-of-the-art training-based methods. These results demonstrate the effectiveness of ELITE for bridging the gap between semantic understanding and reliable action execution.
title	ELITE: Experiential Learning and Intent-Aware Transfer for Self-improving Embodied Agents
topic	Artificial Intelligence
url	https://arxiv.org/abs/2603.24018

Similar Items