Saved in:
| Main Authors: | Tur, Yalcin, Naghiyev, Jalal, Fang, Haoquan, Tsai, Wei-Chuan, Duan, Jiafei, Fox, Dieter, Krishna, Ranjay |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.07845 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models
by: Lin, Zijun, et al.
Published: (2025)
by: Lin, Zijun, et al.
Published: (2025)
SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
by: Fang, Haoquan, et al.
Published: (2025)
by: Fang, Haoquan, et al.
Published: (2025)
EVE: Enabling Anyone to Train Robots using Augmented Reality
by: Wang, Jun, et al.
Published: (2024)
by: Wang, Jun, et al.
Published: (2024)
THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation
by: Pumacay, Wilbert, et al.
Published: (2024)
by: Pumacay, Wilbert, et al.
Published: (2024)
Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
by: Duan, Jiafei, et al.
Published: (2024)
by: Duan, Jiafei, et al.
Published: (2024)
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics
by: Yuan, Wentao, et al.
Published: (2024)
by: Yuan, Wentao, et al.
Published: (2024)
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
by: Duan, Jiafei, et al.
Published: (2024)
by: Duan, Jiafei, et al.
Published: (2024)
MolmoAct: Action Reasoning Models that can Reason in Space
by: Lee, Jason, et al.
Published: (2025)
by: Lee, Jason, et al.
Published: (2025)
VLS: Steering Pretrained Robot Policies via Vision-Language Models
by: Liu, Shuo, et al.
Published: (2026)
by: Liu, Shuo, et al.
Published: (2026)
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
by: Geiping, Jonas, et al.
Published: (2025)
by: Geiping, Jonas, et al.
Published: (2025)
Understanding the Emergence of Seemingly Useless Features in Next-Token Predictors
by: Rofin, Mark, et al.
Published: (2026)
by: Rofin, Mark, et al.
Published: (2026)
vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models
by: Choi, Suhwan, et al.
Published: (2026)
by: Choi, Suhwan, et al.
Published: (2026)
MolmoAct2: Action Reasoning Models for Real-world Deployment
by: Fang, Haoquan, et al.
Published: (2026)
by: Fang, Haoquan, et al.
Published: (2026)
Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models
by: Bai, Shuanghao, et al.
Published: (2026)
by: Bai, Shuanghao, et al.
Published: (2026)
DepthVLA: Enhancing Vision-Language-Action Models with Depth-Aware Spatial Reasoning
by: Yuan, Tianyuan, et al.
Published: (2025)
by: Yuan, Tianyuan, et al.
Published: (2025)
Iterated Learning Improves Compositionality in Large Vision-Language Models
by: Zheng, Chenhao, et al.
Published: (2024)
by: Zheng, Chenhao, et al.
Published: (2024)
TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics
by: Chen, Shirui, et al.
Published: (2026)
by: Chen, Shirui, et al.
Published: (2026)
Selective Visual Representations Improve Convergence and Generalization for Embodied AI
by: Eftekhar, Ainaz, et al.
Published: (2023)
by: Eftekhar, Ainaz, et al.
Published: (2023)
PointArena: Probing Multimodal Grounding Through Language-Guided Pointing
by: Cheng, Long, et al.
Published: (2025)
by: Cheng, Long, et al.
Published: (2025)
OG-VLA: Orthographic Image Generation for 3D-Aware Vision-Language Action Model
by: Singh, Ishika, et al.
Published: (2025)
by: Singh, Ishika, et al.
Published: (2025)
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling
by: Rodkin, Ivan, et al.
Published: (2025)
by: Rodkin, Ivan, et al.
Published: (2025)
Latent Transfer Attack: Adversarial Examples via Generative Latent Spaces
by: Shaar, Eitan, et al.
Published: (2026)
by: Shaar, Eitan, et al.
Published: (2026)
RotVLA: Rotational Latent Action for Vision-Language-Action Model
by: Li, Qiwei, et al.
Published: (2026)
by: Li, Qiwei, et al.
Published: (2026)
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
by: Song, Mingyang, et al.
Published: (2026)
by: Song, Mingyang, et al.
Published: (2026)
Scale Can't Overcome Pragmatics: The Impact of Reporting Bias on Vision-Language Reasoning
by: Kamath, Amita, et al.
Published: (2026)
by: Kamath, Amita, et al.
Published: (2026)
Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training
by: Nepal, Aadim, et al.
Published: (2025)
by: Nepal, Aadim, et al.
Published: (2025)
Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers
by: Kohli, Harsh, et al.
Published: (2026)
by: Kohli, Harsh, et al.
Published: (2026)
GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model
by: Abouzeid, Ali, et al.
Published: (2025)
by: Abouzeid, Ali, et al.
Published: (2025)
LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models
by: Shen, Boyang, et al.
Published: (2026)
by: Shen, Boyang, et al.
Published: (2026)
Two-Scale Latent Dynamics for Recurrent-Depth Transformers
by: Pappone, Francesco, et al.
Published: (2025)
by: Pappone, Francesco, et al.
Published: (2025)
GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation
by: Deshpande, Abhay, et al.
Published: (2025)
by: Deshpande, Abhay, et al.
Published: (2025)
QuoVLA: Quotient Space for Vision-Language-Action Models
by: Wang, Xuan, et al.
Published: (2026)
by: Wang, Xuan, et al.
Published: (2026)
WFM: 3D Wavelet Flow Matching for Ultrafast Multi-Modal MRI Synthesis
by: Tur, Yalcin, et al.
Published: (2026)
by: Tur, Yalcin, et al.
Published: (2026)
OccVLA: Vision-Language-Action Model with Implicit 3D Occupancy Supervision
by: Liu, Ruixun, et al.
Published: (2025)
by: Liu, Ruixun, et al.
Published: (2025)
Ablate-to-Validate: Are Vision-Language Models Really Using Continuous Thought Tokens?
by: Zhang, Tianyi, et al.
Published: (2026)
by: Zhang, Tianyi, et al.
Published: (2026)
VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model
by: Sun, Jingwen, et al.
Published: (2026)
by: Sun, Jingwen, et al.
Published: (2026)
VLA-ATTC: Adaptive Test-Time Compute for VLA Models with Relative Action Critic Model
by: Li, Wenhao, et al.
Published: (2026)
by: Li, Wenhao, et al.
Published: (2026)
QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models
by: Li, Yixuan, et al.
Published: (2025)
by: Li, Yixuan, et al.
Published: (2025)
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
by: Ye, Angen, et al.
Published: (2025)
by: Ye, Angen, et al.
Published: (2025)
RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation
by: Wang, Yi Ru, et al.
Published: (2025)
by: Wang, Yi Ru, et al.
Published: (2025)
Similar Items
-
FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models
by: Lin, Zijun, et al.
Published: (2025) -
SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
by: Fang, Haoquan, et al.
Published: (2025) -
EVE: Enabling Anyone to Train Robots using Augmented Reality
by: Wang, Jun, et al.
Published: (2024) -
THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation
by: Pumacay, Wilbert, et al.
Published: (2024) -
Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
by: Duan, Jiafei, et al.
Published: (2024)