:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tur, Yalcin, Naghiyev, Jalal, Fang, Haoquan, Tsai, Wei-Chuan, Duan, Jiafei, Fox, Dieter, Krishna, Ranjay
Format:	Preprint
Published:	2026
Subjects:	Robotics
Online Access:	https://arxiv.org/abs/2602.07845
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models
by: Lin, Zijun, et al.
Published: (2025)

SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
by: Fang, Haoquan, et al.
Published: (2025)

EVE: Enabling Anyone to Train Robots using Augmented Reality
by: Wang, Jun, et al.
Published: (2024)

THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation
by: Pumacay, Wilbert, et al.
Published: (2024)

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
by: Duan, Jiafei, et al.
Published: (2024)

RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics
by: Yuan, Wentao, et al.
Published: (2024)

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
by: Duan, Jiafei, et al.
Published: (2024)

MolmoAct: Action Reasoning Models that can Reason in Space
by: Lee, Jason, et al.
Published: (2025)

VLS: Steering Pretrained Robot Policies via Vision-Language Models
by: Liu, Shuo, et al.
Published: (2026)

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
by: Geiping, Jonas, et al.
Published: (2025)

Understanding the Emergence of Seemingly Useless Features in Next-Token Predictors
by: Rofin, Mark, et al.
Published: (2026)

vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models
by: Choi, Suhwan, et al.
Published: (2026)

MolmoAct2: Action Reasoning Models for Real-world Deployment
by: Fang, Haoquan, et al.
Published: (2026)

Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models
by: Bai, Shuanghao, et al.
Published: (2026)

DepthVLA: Enhancing Vision-Language-Action Models with Depth-Aware Spatial Reasoning
by: Yuan, Tianyuan, et al.
Published: (2025)

Iterated Learning Improves Compositionality in Large Vision-Language Models
by: Zheng, Chenhao, et al.
Published: (2024)

TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics
by: Chen, Shirui, et al.
Published: (2026)

Selective Visual Representations Improve Convergence and Generalization for Embodied AI
by: Eftekhar, Ainaz, et al.
Published: (2023)

PointArena: Probing Multimodal Grounding Through Language-Guided Pointing
by: Cheng, Long, et al.
Published: (2025)

OG-VLA: Orthographic Image Generation for 3D-Aware Vision-Language Action Model
by: Singh, Ishika, et al.
Published: (2025)

Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling
by: Rodkin, Ivan, et al.
Published: (2025)

Latent Transfer Attack: Adversarial Examples via Generative Latent Spaces
by: Shaar, Eitan, et al.
Published: (2026)

RotVLA: Rotational Latent Action for Vision-Language-Action Model
by: Li, Qiwei, et al.
Published: (2026)

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
by: Song, Mingyang, et al.
Published: (2026)

Scale Can't Overcome Pragmatics: The Impact of Reporting Bias on Vision-Language Reasoning
by: Kamath, Amita, et al.
Published: (2026)

Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training
by: Nepal, Aadim, et al.
Published: (2025)

Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers
by: Kohli, Harsh, et al.
Published: (2026)

GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model
by: Abouzeid, Ali, et al.
Published: (2025)

LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models
by: Shen, Boyang, et al.
Published: (2026)

Two-Scale Latent Dynamics for Recurrent-Depth Transformers
by: Pappone, Francesco, et al.
Published: (2025)

GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation
by: Deshpande, Abhay, et al.
Published: (2025)

QuoVLA: Quotient Space for Vision-Language-Action Models
by: Wang, Xuan, et al.
Published: (2026)

WFM: 3D Wavelet Flow Matching for Ultrafast Multi-Modal MRI Synthesis
by: Tur, Yalcin, et al.
Published: (2026)

OccVLA: Vision-Language-Action Model with Implicit 3D Occupancy Supervision
by: Liu, Ruixun, et al.
Published: (2025)

Ablate-to-Validate: Are Vision-Language Models Really Using Continuous Thought Tokens?
by: Zhang, Tianyi, et al.
Published: (2026)

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model
by: Sun, Jingwen, et al.
Published: (2026)

VLA-ATTC: Adaptive Test-Time Compute for VLA Models with Relative Action Critic Model
by: Li, Wenhao, et al.
Published: (2026)

QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models
by: Li, Yixuan, et al.
Published: (2025)

VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
by: Ye, Angen, et al.
Published: (2025)

RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation
by: Wang, Yi Ru, et al.
Published: (2025)