Saved in:
Bibliographic Details
Main Authors: Yang, Yifan, Duan, Zhixiang, Xie, Tianshi, Cao, Fuyu, Shen, Pinxi, Song, Peili, Jin, Piaopiao, Sun, Guokang, Xu, Shaoqing, You, Yangwei, Liu, Jingtai
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.04018
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912745234890752
author Yang, Yifan
Duan, Zhixiang
Xie, Tianshi
Cao, Fuyu
Shen, Pinxi
Song, Peili
Jin, Piaopiao
Sun, Guokang
Xu, Shaoqing
You, Yangwei
Liu, Jingtai
author_facet Yang, Yifan
Duan, Zhixiang
Xie, Tianshi
Cao, Fuyu
Shen, Pinxi
Song, Peili
Jin, Piaopiao
Sun, Guokang
Xu, Shaoqing
You, Yangwei
Liu, Jingtai
contents Robotic manipulation is a fundamental component of automation. However, traditional perception-planning pipelines often fall short in open-ended tasks due to limited flexibility, while the architecture of a single end-to-end Vision-Language-Action (VLA) offers promising capabilities but lacks crucial mechanisms for anticipating and recovering from failure. To address these challenges, we propose FPC-VLA, a dual-model framework that integrates VLA with a supervisor for failure prediction and correction. The supervisor evaluates action viability through vision-language queries and generates corrective strategies when risks arise, trained efficiently without manual labeling. A dual-stream fusion module further refines actions by leveraging past predictions. Evaluation results on multiple simulation platforms (SIMPLER and LIBERO) and robot embodiments (WidowX, Google Robot, Franka) show that FPC-VLA outperforms state-of-the-art models in both zero-shot and fine-tuned settings. Successful real-world deployments on diverse, long-horizon tasks confirm FPC-VLA's strong generalization and practical utility for building more reliable autonomous systems.
format Preprint
id arxiv_https___arxiv_org_abs_2509_04018
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction
Yang, Yifan
Duan, Zhixiang
Xie, Tianshi
Cao, Fuyu
Shen, Pinxi
Song, Peili
Jin, Piaopiao
Sun, Guokang
Xu, Shaoqing
You, Yangwei
Liu, Jingtai
Robotics
Robotic manipulation is a fundamental component of automation. However, traditional perception-planning pipelines often fall short in open-ended tasks due to limited flexibility, while the architecture of a single end-to-end Vision-Language-Action (VLA) offers promising capabilities but lacks crucial mechanisms for anticipating and recovering from failure. To address these challenges, we propose FPC-VLA, a dual-model framework that integrates VLA with a supervisor for failure prediction and correction. The supervisor evaluates action viability through vision-language queries and generates corrective strategies when risks arise, trained efficiently without manual labeling. A dual-stream fusion module further refines actions by leveraging past predictions. Evaluation results on multiple simulation platforms (SIMPLER and LIBERO) and robot embodiments (WidowX, Google Robot, Franka) show that FPC-VLA outperforms state-of-the-art models in both zero-shot and fine-tuned settings. Successful real-world deployments on diverse, long-horizon tasks confirm FPC-VLA's strong generalization and practical utility for building more reliable autonomous systems.
title FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction
topic Robotics
url https://arxiv.org/abs/2509.04018