Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yang, Yifan, Duan, Zhixiang, Xie, Tianshi, Cao, Fuyu, Shen, Pinxi, Song, Peili, Jin, Piaopiao, Sun, Guokang, Xu, Shaoqing, You, Yangwei, Liu, Jingtai
Format:	Preprint
Published:	2025
Subjects:	Robotics
Online Access:	https://arxiv.org/abs/2509.04018
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912745234890752
author	Yang, Yifan Duan, Zhixiang Xie, Tianshi Cao, Fuyu Shen, Pinxi Song, Peili Jin, Piaopiao Sun, Guokang Xu, Shaoqing You, Yangwei Liu, Jingtai
author_facet	Yang, Yifan Duan, Zhixiang Xie, Tianshi Cao, Fuyu Shen, Pinxi Song, Peili Jin, Piaopiao Sun, Guokang Xu, Shaoqing You, Yangwei Liu, Jingtai
contents	Robotic manipulation is a fundamental component of automation. However, traditional perception-planning pipelines often fall short in open-ended tasks due to limited flexibility, while the architecture of a single end-to-end Vision-Language-Action (VLA) offers promising capabilities but lacks crucial mechanisms for anticipating and recovering from failure. To address these challenges, we propose FPC-VLA, a dual-model framework that integrates VLA with a supervisor for failure prediction and correction. The supervisor evaluates action viability through vision-language queries and generates corrective strategies when risks arise, trained efficiently without manual labeling. A dual-stream fusion module further refines actions by leveraging past predictions. Evaluation results on multiple simulation platforms (SIMPLER and LIBERO) and robot embodiments (WidowX, Google Robot, Franka) show that FPC-VLA outperforms state-of-the-art models in both zero-shot and fine-tuned settings. Successful real-world deployments on diverse, long-horizon tasks confirm FPC-VLA's strong generalization and practical utility for building more reliable autonomous systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_04018
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction Yang, Yifan Duan, Zhixiang Xie, Tianshi Cao, Fuyu Shen, Pinxi Song, Peili Jin, Piaopiao Sun, Guokang Xu, Shaoqing You, Yangwei Liu, Jingtai Robotics Robotic manipulation is a fundamental component of automation. However, traditional perception-planning pipelines often fall short in open-ended tasks due to limited flexibility, while the architecture of a single end-to-end Vision-Language-Action (VLA) offers promising capabilities but lacks crucial mechanisms for anticipating and recovering from failure. To address these challenges, we propose FPC-VLA, a dual-model framework that integrates VLA with a supervisor for failure prediction and correction. The supervisor evaluates action viability through vision-language queries and generates corrective strategies when risks arise, trained efficiently without manual labeling. A dual-stream fusion module further refines actions by leveraging past predictions. Evaluation results on multiple simulation platforms (SIMPLER and LIBERO) and robot embodiments (WidowX, Google Robot, Franka) show that FPC-VLA outperforms state-of-the-art models in both zero-shot and fine-tuned settings. Successful real-world deployments on diverse, long-horizon tasks confirm FPC-VLA's strong generalization and practical utility for building more reliable autonomous systems.
title	FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction
topic	Robotics
url	https://arxiv.org/abs/2509.04018

Similar Items