Saved in:
| Main Authors: | Lin, Zijun, Duan, Jiafei, Fang, Haoquan, Fox, Dieter, Krishna, Ranjay, Tan, Cheston, Wen, Bihan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.01642 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning
by: Tur, Yalcin, et al.
Published: (2026)
by: Tur, Yalcin, et al.
Published: (2026)
SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
by: Fang, Haoquan, et al.
Published: (2025)
by: Fang, Haoquan, et al.
Published: (2025)
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
by: Duan, Jiafei, et al.
Published: (2024)
by: Duan, Jiafei, et al.
Published: (2024)
Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
by: Duan, Jiafei, et al.
Published: (2024)
by: Duan, Jiafei, et al.
Published: (2024)
EVE: Enabling Anyone to Train Robots using Augmented Reality
by: Wang, Jun, et al.
Published: (2024)
by: Wang, Jun, et al.
Published: (2024)
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics
by: Yuan, Wentao, et al.
Published: (2024)
by: Yuan, Wentao, et al.
Published: (2024)
MolmoAct: Action Reasoning Models that can Reason in Space
by: Lee, Jason, et al.
Published: (2025)
by: Lee, Jason, et al.
Published: (2025)
THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation
by: Pumacay, Wilbert, et al.
Published: (2024)
by: Pumacay, Wilbert, et al.
Published: (2024)
VLS: Steering Pretrained Robot Policies via Vision-Language Models
by: Liu, Shuo, et al.
Published: (2026)
by: Liu, Shuo, et al.
Published: (2026)
GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding
by: Lin, Zijun, et al.
Published: (2025)
by: Lin, Zijun, et al.
Published: (2025)
MolmoAct2: Action Reasoning Models for Real-world Deployment
by: Fang, Haoquan, et al.
Published: (2026)
by: Fang, Haoquan, et al.
Published: (2026)
Failing Forward: Adaptive Failure-Informed Learning for Vision-Language-Action Models
by: Zheng, Meng, et al.
Published: (2026)
by: Zheng, Meng, et al.
Published: (2026)
Octopi: Object Property Reasoning with Large Tactile-Language Models
by: Yu, Samson, et al.
Published: (2024)
by: Yu, Samson, et al.
Published: (2024)
TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics
by: Chen, Shirui, et al.
Published: (2026)
by: Chen, Shirui, et al.
Published: (2026)
FailSafe: High-performance Resilient Serving
by: Xu, Ziyi, et al.
Published: (2025)
by: Xu, Ziyi, et al.
Published: (2025)
10 Open Challenges Steering the Future of Vision-Language-Action Models
by: Poria, Soujanya, et al.
Published: (2025)
by: Poria, Soujanya, et al.
Published: (2025)
I-FailSense: Towards General Robotic Failure Detection with Vision-Language Models
by: Grislain, Clemence, et al.
Published: (2025)
by: Grislain, Clemence, et al.
Published: (2025)
Expect the Unexpected: FailSafe Long Context QA for Finance
by: Kamble, Kiran, et al.
Published: (2025)
by: Kamble, Kiran, et al.
Published: (2025)
RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation
by: Wang, Yi Ru, et al.
Published: (2025)
by: Wang, Yi Ru, et al.
Published: (2025)
SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models
by: Ray, Arijit, et al.
Published: (2024)
by: Ray, Arijit, et al.
Published: (2024)
Automating Robot Failure Recovery Using Vision-Language Models With Optimized Prompts
by: Chen, Hongyi, et al.
Published: (2024)
by: Chen, Hongyi, et al.
Published: (2024)
GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation
by: Deshpande, Abhay, et al.
Published: (2025)
by: Deshpande, Abhay, et al.
Published: (2025)
RoboMD: Uncovering Robot Vulnerabilities through Semantic Potential Fields
by: Sagar, Som, et al.
Published: (2024)
by: Sagar, Som, et al.
Published: (2024)
SAFE: Multitask Failure Detection for Vision-Language-Action Models
by: Gu, Qiao, et al.
Published: (2025)
by: Gu, Qiao, et al.
Published: (2025)
RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models
by: Liufu, Weijia, et al.
Published: (2026)
by: Liufu, Weijia, et al.
Published: (2026)
MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation
by: Deshpande, Abhay, et al.
Published: (2026)
by: Deshpande, Abhay, et al.
Published: (2026)
CapNav: Benchmarking Vision Language Models on Capability-conditioned Indoor Navigation
by: Su, Xia, et al.
Published: (2026)
by: Su, Xia, et al.
Published: (2026)
OG-VLA: Orthographic Image Generation for 3D-Aware Vision-Language Action Model
by: Singh, Ishika, et al.
Published: (2025)
by: Singh, Ishika, et al.
Published: (2025)
FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction
by: Yang, Yifan, et al.
Published: (2025)
by: Yang, Yifan, et al.
Published: (2025)
Hierarchical Vision Language Action Model Using Success and Failure Demonstrations
by: Park, Jeongeun, et al.
Published: (2025)
by: Park, Jeongeun, et al.
Published: (2025)
Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models
by: Xu, Haiweng, et al.
Published: (2026)
by: Xu, Haiweng, et al.
Published: (2026)
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
by: Lin, Fanqi, et al.
Published: (2025)
by: Lin, Fanqi, et al.
Published: (2025)
Guiding Long-Horizon Task and Motion Planning with Vision Language Models
by: Yang, Zhutian, et al.
Published: (2024)
by: Yang, Zhutian, et al.
Published: (2024)
A Human-in-the-Loop Confidence-Aware Failure Recovery Framework for Modular Robot Policies
by: Banerjee, Rohan, et al.
Published: (2026)
by: Banerjee, Rohan, et al.
Published: (2026)
RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models
by: Chen, Yuxuan, et al.
Published: (2025)
by: Chen, Yuxuan, et al.
Published: (2025)
Zero-shot Object Navigation with Vision-Language Models Reasoning
by: Wen, Congcong, et al.
Published: (2024)
by: Wen, Congcong, et al.
Published: (2024)
RoVLA: Multi-Consistency Constraints for Robust Vision-Language-Action Models
by: Luo, Jingzhou, et al.
Published: (2026)
by: Luo, Jingzhou, et al.
Published: (2026)
Self-Refining Vision Language Model for Robotic Failure Detection and Reasoning
by: Qi, Carl, et al.
Published: (2026)
by: Qi, Carl, et al.
Published: (2026)
InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning
by: Zhang, Ji, et al.
Published: (2025)
by: Zhang, Ji, et al.
Published: (2025)
Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models
by: Liu, Haoyun, et al.
Published: (2026)
by: Liu, Haoyun, et al.
Published: (2026)
Similar Items
-
Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning
by: Tur, Yalcin, et al.
Published: (2026) -
SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
by: Fang, Haoquan, et al.
Published: (2025) -
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
by: Duan, Jiafei, et al.
Published: (2024) -
Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
by: Duan, Jiafei, et al.
Published: (2024) -
EVE: Enabling Anyone to Train Robots using Augmented Reality
by: Wang, Jun, et al.
Published: (2024)