:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Rao, Zhifeng, Chen, Wenlong, Xie, Lei, Hua, Xia, Yin, Dongfu, Tian, Zhen, Yu, F. Richard
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.10698
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DepthVLA: Enhancing Vision-Language-Action Models with Depth-Aware Spatial Reasoning
by: Yuan, Tianyuan, et al.
Published: (2025)

SaiVLA-0: Cerebrum--Pons--Cerebellum Tripartite Architecture for Compute-Aware Vision-Language-Action
by: Shi, Xiang, et al.
Published: (2026)

AIR-VLA: Vision-Language-Action Systems for Aerial Manipulation
by: Sun, Jianli, et al.
Published: (2026)

GeoVLA: Empowering 3D Representations in Vision-Language-Action Models
by: Sun, Lin, et al.
Published: (2025)

EdgeVLA: Efficient Vision-Language-Action Models
by: Budzianowski, Paweł, et al.
Published: (2025)

QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models
by: Li, Yixuan, et al.
Published: (2025)

3D-VLA: A 3D Vision-Language-Action Generative World Model
by: Zhen, Haoyu, et al.
Published: (2024)

GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models
by: Sarowar, Md Selim, et al.
Published: (2026)

Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance
by: Wang, Songsheng, et al.
Published: (2025)

DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models
by: Yin, Cheng, et al.
Published: (2025)

LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models
by: Shen, Boyang, et al.
Published: (2026)

AffordVLA: Injecting Affordance Representations into Vision-Language-Action Models via Implicit Feature Alignment
by: Kong, Weijie, et al.
Published: (2026)

MAP-VLA: Memory-Augmented Prompting for Vision-Language-Action Model in Robotic Manipulation
by: Li, Runhao, et al.
Published: (2025)

E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes
by: Zhai, Jiajun, et al.
Published: (2026)

AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention
by: Xiao, Lei, et al.
Published: (2025)

DropVLA: An Action-Level Backdoor Attack on Vision-Language-Action Models
by: Xu, Zonghuan, et al.
Published: (2025)

PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models
by: Guo, Xinyu, et al.
Published: (2026)

AC^2-VLA: Action-Context-Aware Adaptive Computation in Vision-Language-Action Models for Efficient Robotic Manipulation
by: Yu, Wenda, et al.
Published: (2026)

VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning
by: Wang, Chaoyang, et al.
Published: (2026)

AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation
by: Wu, Yangchao, et al.
Published: (2023)

RLinf-VLA: A Unified and Efficient Framework for Reinforcement Learning of Vision-Language-Action Models
by: Zang, Hongzhi, et al.
Published: (2025)

RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models
by: Liufu, Weijia, et al.
Published: (2026)

FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models
by: Wang, Xin, et al.
Published: (2025)

IA-VLA: Input Augmentation for Vision-Language-Action models in settings with semantically complex tasks
by: Hannus, Eric, et al.
Published: (2025)

4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration
by: Zhang, Jiahui, et al.
Published: (2025)

CRL-VLA: Continual Vision-Language-Action Learning
by: Zeng, Qixin, et al.
Published: (2026)

SwitchVLA: Execution-Aware Task Switching for Vision-Language-Action Models
by: Li, Meng, et al.
Published: (2025)

RotVLA: Rotational Latent Action for Vision-Language-Action Model
by: Li, Qiwei, et al.
Published: (2026)

ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models
by: Zhong, Linqing, et al.
Published: (2026)

DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation
by: Xie, Haozhe, et al.
Published: (2026)

PointVLA: Injecting the 3D World into Vision-Language-Action Models
by: Li, Chengmeng, et al.
Published: (2025)

ROI-Driven Foveated Attention for Unified Egocentric Representations in Vision-Language-Action Systems
by: Sun, Xinhai, et al.
Published: (2026)

VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching
by: Xu, Siyu, et al.
Published: (2025)

StyleVLA: Driving Style-Aware Vision Language Action Model for Autonomous Driving
by: Gao, Yuan, et al.
Published: (2026)

ElegantVLA: Learning When to Think for Efficient Vision-Language-Action Models
by: Li, Ye, et al.
Published: (2026)

CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving
by: Arai, Hidehisa, et al.
Published: (2024)

Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
by: Liang, Zhixuan, et al.
Published: (2025)

StereoVLA: Enhancing Vision-Language-Action Models with Stereo Vision
by: Deng, Shengliang, et al.
Published: (2025)

CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies
by: Du, Fan, et al.
Published: (2026)

InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
by: Yang, Shuai, et al.
Published: (2025)