:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lin, Zijun, Duan, Jiafei, Fang, Haoquan, Fox, Dieter, Krishna, Ranjay, Tan, Cheston, Wen, Bihan
Format:	Preprint
Published:	2025
Subjects:	Robotics
Online Access:	https://arxiv.org/abs/2510.01642
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning
by: Tur, Yalcin, et al.
Published: (2026)

SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
by: Fang, Haoquan, et al.
Published: (2025)

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
by: Duan, Jiafei, et al.
Published: (2024)

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
by: Duan, Jiafei, et al.
Published: (2024)

EVE: Enabling Anyone to Train Robots using Augmented Reality
by: Wang, Jun, et al.
Published: (2024)

RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics
by: Yuan, Wentao, et al.
Published: (2024)

MolmoAct: Action Reasoning Models that can Reason in Space
by: Lee, Jason, et al.
Published: (2025)

THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation
by: Pumacay, Wilbert, et al.
Published: (2024)

VLS: Steering Pretrained Robot Policies via Vision-Language Models
by: Liu, Shuo, et al.
Published: (2026)

GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding
by: Lin, Zijun, et al.
Published: (2025)

MolmoAct2: Action Reasoning Models for Real-world Deployment
by: Fang, Haoquan, et al.
Published: (2026)

Failing Forward: Adaptive Failure-Informed Learning for Vision-Language-Action Models
by: Zheng, Meng, et al.
Published: (2026)

Octopi: Object Property Reasoning with Large Tactile-Language Models
by: Yu, Samson, et al.
Published: (2024)

TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics
by: Chen, Shirui, et al.
Published: (2026)

FailSafe: High-performance Resilient Serving
by: Xu, Ziyi, et al.
Published: (2025)

10 Open Challenges Steering the Future of Vision-Language-Action Models
by: Poria, Soujanya, et al.
Published: (2025)

I-FailSense: Towards General Robotic Failure Detection with Vision-Language Models
by: Grislain, Clemence, et al.
Published: (2025)

Expect the Unexpected: FailSafe Long Context QA for Finance
by: Kamble, Kiran, et al.
Published: (2025)

RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation
by: Wang, Yi Ru, et al.
Published: (2025)

SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models
by: Ray, Arijit, et al.
Published: (2024)

Automating Robot Failure Recovery Using Vision-Language Models With Optimized Prompts
by: Chen, Hongyi, et al.
Published: (2024)

GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation
by: Deshpande, Abhay, et al.
Published: (2025)

RoboMD: Uncovering Robot Vulnerabilities through Semantic Potential Fields
by: Sagar, Som, et al.
Published: (2024)

SAFE: Multitask Failure Detection for Vision-Language-Action Models
by: Gu, Qiao, et al.
Published: (2025)

RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models
by: Liufu, Weijia, et al.
Published: (2026)

MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation
by: Deshpande, Abhay, et al.
Published: (2026)

CapNav: Benchmarking Vision Language Models on Capability-conditioned Indoor Navigation
by: Su, Xia, et al.
Published: (2026)

OG-VLA: Orthographic Image Generation for 3D-Aware Vision-Language Action Model
by: Singh, Ishika, et al.
Published: (2025)

FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction
by: Yang, Yifan, et al.
Published: (2025)

Hierarchical Vision Language Action Model Using Success and Failure Demonstrations
by: Park, Jeongeun, et al.
Published: (2025)

Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models
by: Xu, Haiweng, et al.
Published: (2026)

OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
by: Lin, Fanqi, et al.
Published: (2025)

Guiding Long-Horizon Task and Motion Planning with Vision Language Models
by: Yang, Zhutian, et al.
Published: (2024)

A Human-in-the-Loop Confidence-Aware Failure Recovery Framework for Modular Robot Policies
by: Banerjee, Rohan, et al.
Published: (2026)

RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models
by: Chen, Yuxuan, et al.
Published: (2025)

Zero-shot Object Navigation with Vision-Language Models Reasoning
by: Wen, Congcong, et al.
Published: (2024)

RoVLA: Multi-Consistency Constraints for Robust Vision-Language-Action Models
by: Luo, Jingzhou, et al.
Published: (2026)

Self-Refining Vision Language Model for Robotic Failure Detection and Reasoning
by: Qi, Carl, et al.
Published: (2026)

InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning
by: Zhang, Ji, et al.
Published: (2025)

Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models
by: Liu, Haoyun, et al.
Published: (2026)