:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhou, Zheyuan, Du, Liang, Sun, Zixun, Zhou, Xiaoyu, Ye, Ruimin, Chen, Qihao, Chen, Yinda, Qiu, Lemiao
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.02212
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving
by: Zhang, Dapeng, et al.
Published: (2025)

CAD-Judge: Toward Efficient Morphological Grading and Verification for Text-to-CAD Generation
by: Zhou, Zheyuan, et al.
Published: (2025)

RotVLA: Rotational Latent Action for Vision-Language-Action Model
by: Li, Qiwei, et al.
Published: (2026)

BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models
by: Ma, Xiaoyu, et al.
Published: (2025)

VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
by: Zhang, Jianke, et al.
Published: (2026)

CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model
by: Zhang, Dapeng, et al.
Published: (2025)

VLA-Trace: Diagnosing Vision-Language-Action Models through Representation and Behavior Tracing
by: Shi, Haoyuan, et al.
Published: (2026)

R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection
by: Zhou, Zheyuan, et al.
Published: (2024)

Pure Vision Language Action (VLA) Models: A Comprehensive Survey
by: Zhang, Dapeng, et al.
Published: (2025)

VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models
by: Wang, Zixuan, et al.
Published: (2026)

RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models
by: Liufu, Weijia, et al.
Published: (2026)

3D-VLA: A 3D Vision-Language-Action Generative World Model
by: Zhen, Haoyu, et al.
Published: (2024)

FutureVLA: Joint Visuomotor Prediction for Vision-Language-Action Model
by: Xu, Xiaoxu, et al.
Published: (2026)

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model
by: Sun, Jingwen, et al.
Published: (2026)

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments
by: Wang, Qiuyue, et al.
Published: (2026)

QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models
by: Li, Yixuan, et al.
Published: (2025)

BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization
by: Zhou, Xueyang, et al.
Published: (2025)

ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver
by: Song, Wenxuan, et al.
Published: (2025)

GesVLA: Gesture-Aware Vision-Language-Action Model Embedded Representations
by: Guo, Wenxuan, et al.
Published: (2026)

EaqVLA: Encoding-aligned Quantization for Vision-Language-Action Models
by: Jiang, Feng, et al.
Published: (2025)

Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models
by: Bai, Shuanghao, et al.
Published: (2026)

ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations
by: Zhou, Yuhao, et al.
Published: (2026)

LLaDA-VLA: Vision Language Diffusion Action Models
by: Wen, Yuqing, et al.
Published: (2025)

PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
by: Liang, Wenqi, et al.
Published: (2025)

VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
by: Ye, Angen, et al.
Published: (2025)

FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models
by: Wang, Xin, et al.
Published: (2025)

Counterfactual VLA: Self-Reflective Vision-Language-Action Model with Adaptive Reasoning
by: Peng, Zhenghao "Mark", et al.
Published: (2025)

ST4VLA: Spatially Guided Training for Vision-Language-Action Models
by: Ye, Jinhui, et al.
Published: (2026)

HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
by: Liu, Jiaming, et al.
Published: (2025)

MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation
by: Shi, Hao, et al.
Published: (2025)

StableVLA: Towards Robust Vision-Language-Action Models without Extra Data
by: Fu, Yiyang, et al.
Published: (2026)

DSOcc: Leveraging Depth Awareness and Semantic Aid to Boost Camera-Based 3D Semantic Occupancy Prediction
by: Fang, Naiyu, et al.
Published: (2025)

OccLE: Label-Efficient 3D Semantic Occupancy Prediction
by: Fang, Naiyu, et al.
Published: (2025)

PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models
by: Guo, Xinyu, et al.
Published: (2026)

SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model
by: Zhou, Zewei, et al.
Published: (2026)

StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems
by: Ye, Jinhui, et al.
Published: (2026)

OpenVLA: An Open-Source Vision-Language-Action Model
by: Kim, Moo Jin, et al.
Published: (2024)

GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model
by: Abouzeid, Ali, et al.
Published: (2025)

UrbanVLA: A Vision-Language-Action Model for Urban Micromobility
by: Li, Anqi, et al.
Published: (2025)

EdgeVLA: Efficient Vision-Language-Action Models
by: Budzianowski, Paweł, et al.
Published: (2025)