Saved in:
| Main Authors: | Zhou, Zheyuan, Du, Liang, Sun, Zixun, Zhou, Xiaoyu, Ye, Ruimin, Chen, Qihao, Chen, Yinda, Qiu, Lemiao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.02212 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving
by: Zhang, Dapeng, et al.
Published: (2025)
by: Zhang, Dapeng, et al.
Published: (2025)
CAD-Judge: Toward Efficient Morphological Grading and Verification for Text-to-CAD Generation
by: Zhou, Zheyuan, et al.
Published: (2025)
by: Zhou, Zheyuan, et al.
Published: (2025)
RotVLA: Rotational Latent Action for Vision-Language-Action Model
by: Li, Qiwei, et al.
Published: (2026)
by: Li, Qiwei, et al.
Published: (2026)
BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models
by: Ma, Xiaoyu, et al.
Published: (2025)
by: Ma, Xiaoyu, et al.
Published: (2025)
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
by: Zhang, Jianke, et al.
Published: (2026)
by: Zhang, Jianke, et al.
Published: (2026)
CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model
by: Zhang, Dapeng, et al.
Published: (2025)
by: Zhang, Dapeng, et al.
Published: (2025)
VLA-Trace: Diagnosing Vision-Language-Action Models through Representation and Behavior Tracing
by: Shi, Haoyuan, et al.
Published: (2026)
by: Shi, Haoyuan, et al.
Published: (2026)
R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection
by: Zhou, Zheyuan, et al.
Published: (2024)
by: Zhou, Zheyuan, et al.
Published: (2024)
Pure Vision Language Action (VLA) Models: A Comprehensive Survey
by: Zhang, Dapeng, et al.
Published: (2025)
by: Zhang, Dapeng, et al.
Published: (2025)
VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models
by: Wang, Zixuan, et al.
Published: (2026)
by: Wang, Zixuan, et al.
Published: (2026)
RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models
by: Liufu, Weijia, et al.
Published: (2026)
by: Liufu, Weijia, et al.
Published: (2026)
3D-VLA: A 3D Vision-Language-Action Generative World Model
by: Zhen, Haoyu, et al.
Published: (2024)
by: Zhen, Haoyu, et al.
Published: (2024)
FutureVLA: Joint Visuomotor Prediction for Vision-Language-Action Model
by: Xu, Xiaoxu, et al.
Published: (2026)
by: Xu, Xiaoxu, et al.
Published: (2026)
VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model
by: Sun, Jingwen, et al.
Published: (2026)
by: Sun, Jingwen, et al.
Published: (2026)
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments
by: Wang, Qiuyue, et al.
Published: (2026)
by: Wang, Qiuyue, et al.
Published: (2026)
QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models
by: Li, Yixuan, et al.
Published: (2025)
by: Li, Yixuan, et al.
Published: (2025)
BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization
by: Zhou, Xueyang, et al.
Published: (2025)
by: Zhou, Xueyang, et al.
Published: (2025)
ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver
by: Song, Wenxuan, et al.
Published: (2025)
by: Song, Wenxuan, et al.
Published: (2025)
GesVLA: Gesture-Aware Vision-Language-Action Model Embedded Representations
by: Guo, Wenxuan, et al.
Published: (2026)
by: Guo, Wenxuan, et al.
Published: (2026)
EaqVLA: Encoding-aligned Quantization for Vision-Language-Action Models
by: Jiang, Feng, et al.
Published: (2025)
by: Jiang, Feng, et al.
Published: (2025)
Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models
by: Bai, Shuanghao, et al.
Published: (2026)
by: Bai, Shuanghao, et al.
Published: (2026)
ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations
by: Zhou, Yuhao, et al.
Published: (2026)
by: Zhou, Yuhao, et al.
Published: (2026)
LLaDA-VLA: Vision Language Diffusion Action Models
by: Wen, Yuqing, et al.
Published: (2025)
by: Wen, Yuqing, et al.
Published: (2025)
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
by: Liang, Wenqi, et al.
Published: (2025)
by: Liang, Wenqi, et al.
Published: (2025)
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
by: Ye, Angen, et al.
Published: (2025)
by: Ye, Angen, et al.
Published: (2025)
FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models
by: Wang, Xin, et al.
Published: (2025)
by: Wang, Xin, et al.
Published: (2025)
Counterfactual VLA: Self-Reflective Vision-Language-Action Model with Adaptive Reasoning
by: Peng, Zhenghao "Mark", et al.
Published: (2025)
by: Peng, Zhenghao "Mark", et al.
Published: (2025)
ST4VLA: Spatially Guided Training for Vision-Language-Action Models
by: Ye, Jinhui, et al.
Published: (2026)
by: Ye, Jinhui, et al.
Published: (2026)
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
by: Liu, Jiaming, et al.
Published: (2025)
by: Liu, Jiaming, et al.
Published: (2025)
MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation
by: Shi, Hao, et al.
Published: (2025)
by: Shi, Hao, et al.
Published: (2025)
StableVLA: Towards Robust Vision-Language-Action Models without Extra Data
by: Fu, Yiyang, et al.
Published: (2026)
by: Fu, Yiyang, et al.
Published: (2026)
DSOcc: Leveraging Depth Awareness and Semantic Aid to Boost Camera-Based 3D Semantic Occupancy Prediction
by: Fang, Naiyu, et al.
Published: (2025)
by: Fang, Naiyu, et al.
Published: (2025)
OccLE: Label-Efficient 3D Semantic Occupancy Prediction
by: Fang, Naiyu, et al.
Published: (2025)
by: Fang, Naiyu, et al.
Published: (2025)
PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models
by: Guo, Xinyu, et al.
Published: (2026)
by: Guo, Xinyu, et al.
Published: (2026)
SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model
by: Zhou, Zewei, et al.
Published: (2026)
by: Zhou, Zewei, et al.
Published: (2026)
StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems
by: Ye, Jinhui, et al.
Published: (2026)
by: Ye, Jinhui, et al.
Published: (2026)
OpenVLA: An Open-Source Vision-Language-Action Model
by: Kim, Moo Jin, et al.
Published: (2024)
by: Kim, Moo Jin, et al.
Published: (2024)
GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model
by: Abouzeid, Ali, et al.
Published: (2025)
by: Abouzeid, Ali, et al.
Published: (2025)
UrbanVLA: A Vision-Language-Action Model for Urban Micromobility
by: Li, Anqi, et al.
Published: (2025)
by: Li, Anqi, et al.
Published: (2025)
EdgeVLA: Efficient Vision-Language-Action Models
by: Budzianowski, Paweł, et al.
Published: (2025)
by: Budzianowski, Paweł, et al.
Published: (2025)
Similar Items
-
Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving
by: Zhang, Dapeng, et al.
Published: (2025) -
CAD-Judge: Toward Efficient Morphological Grading and Verification for Text-to-CAD Generation
by: Zhou, Zheyuan, et al.
Published: (2025) -
RotVLA: Rotational Latent Action for Vision-Language-Action Model
by: Li, Qiwei, et al.
Published: (2026) -
BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models
by: Ma, Xiaoyu, et al.
Published: (2025) -
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
by: Zhang, Jianke, et al.
Published: (2026)