Saved in:
Bibliographic Details
Main Authors: Li, Wenhao, Su, Xiu, Niu, Dan, Cao, Yichao, Xu, Hongyan, Qu, Zhe, Fan, Lei, You, Shan, Xu, Chang
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.01191
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918528729219072
author Li, Wenhao
Su, Xiu
Niu, Dan
Cao, Yichao
Xu, Hongyan
Qu, Zhe
Fan, Lei
You, Shan
Xu, Chang
author_facet Li, Wenhao
Su, Xiu
Niu, Dan
Cao, Yichao
Xu, Hongyan
Qu, Zhe
Fan, Lei
You, Shan
Xu, Chang
contents Vision-language-action (VLA) models have advanced the field of embodied manipulation by harnessing broad world knowledge and strong generalization. However, current VLA models still face several key challenges, including limited reasoning capability, lack of status monitoring, and difficulty in self-correction. In this paper, we introduce \textbf{Sentinel-VLA}, a metacognitive VLA model equipped with an active ``sentinel'' module to monitor real-time execution status. Only when necessary, such as during initial planning or upon detecting an error, the model triggers a dynamic reasoning or formulate error recovery solutions. This on-demand reasoning mechanism ensures robust decision-making while minimizing computational overhead. Notably, all training data (spanning 44 tasks and over 2.6 million transitions) is automatically generated and annotated through our designed pipeline. We also propose the Self-Evolving Continual Learning (SECL) algorithm, which allows Sentinel-VLA to identify its capability boundaries and automatically collect data for expansion, paired with Orthogonal Continual Adapter (OC-Adapter) to constrain parameter updates to an orthogonal space, thereby preventing catastrophic forgetting. Real-world experiments demonstrate that Sentinel-VLA boosts the task success rate by over 30\% compared to the SOTA model, PI0. We will open-source all the code, weights, and data generation pipeline.
format Preprint
id arxiv_https___arxiv_org_abs_2605_01191
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Sentinel-VLA: A Metacognitive VLA Model with Active Status Monitoring for Dynamic Reasoning and Error Recovery
Li, Wenhao
Su, Xiu
Niu, Dan
Cao, Yichao
Xu, Hongyan
Qu, Zhe
Fan, Lei
You, Shan
Xu, Chang
Robotics
Vision-language-action (VLA) models have advanced the field of embodied manipulation by harnessing broad world knowledge and strong generalization. However, current VLA models still face several key challenges, including limited reasoning capability, lack of status monitoring, and difficulty in self-correction. In this paper, we introduce \textbf{Sentinel-VLA}, a metacognitive VLA model equipped with an active ``sentinel'' module to monitor real-time execution status. Only when necessary, such as during initial planning or upon detecting an error, the model triggers a dynamic reasoning or formulate error recovery solutions. This on-demand reasoning mechanism ensures robust decision-making while minimizing computational overhead. Notably, all training data (spanning 44 tasks and over 2.6 million transitions) is automatically generated and annotated through our designed pipeline. We also propose the Self-Evolving Continual Learning (SECL) algorithm, which allows Sentinel-VLA to identify its capability boundaries and automatically collect data for expansion, paired with Orthogonal Continual Adapter (OC-Adapter) to constrain parameter updates to an orthogonal space, thereby preventing catastrophic forgetting. Real-world experiments demonstrate that Sentinel-VLA boosts the task success rate by over 30\% compared to the SOTA model, PI0. We will open-source all the code, weights, and data generation pipeline.
title Sentinel-VLA: A Metacognitive VLA Model with Active Status Monitoring for Dynamic Reasoning and Error Recovery
topic Robotics
url https://arxiv.org/abs/2605.01191