Saved in:
| Main Authors: | Xie, Yuhan, Yan, Yuping, Zhao, Yunqi, Wang, Handing, Jin, Yaochu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.10055 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models
by: Yan, Yuping, et al.
Published: (2025)
by: Yan, Yuping, et al.
Published: (2025)
RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models
by: Zhang, Hongyin, et al.
Published: (2025)
by: Zhang, Hongyin, et al.
Published: (2025)
StereoVLA: Enhancing Vision-Language-Action Models with Stereo Vision
by: Deng, Shengliang, et al.
Published: (2025)
by: Deng, Shengliang, et al.
Published: (2025)
CRL-VLA: Continual Vision-Language-Action Learning
by: Zeng, Qixin, et al.
Published: (2026)
by: Zeng, Qixin, et al.
Published: (2026)
PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models
by: Guo, Xinyu, et al.
Published: (2026)
by: Guo, Xinyu, et al.
Published: (2026)
NS-VLA: Towards Neuro-Symbolic Vision-Language-Action Models
by: Zhu, Ziyue, et al.
Published: (2026)
by: Zhu, Ziyue, et al.
Published: (2026)
SwitchVLA: Execution-Aware Task Switching for Vision-Language-Action Models
by: Li, Meng, et al.
Published: (2025)
by: Li, Meng, et al.
Published: (2025)
EdgeVLA: Efficient Vision-Language-Action Models
by: Budzianowski, Paweł, et al.
Published: (2025)
by: Budzianowski, Paweł, et al.
Published: (2025)
GeoVLA: Empowering 3D Representations in Vision-Language-Action Models
by: Sun, Lin, et al.
Published: (2025)
by: Sun, Lin, et al.
Published: (2025)
RoVLA: Multi-Consistency Constraints for Robust Vision-Language-Action Models
by: Luo, Jingzhou, et al.
Published: (2026)
by: Luo, Jingzhou, et al.
Published: (2026)
SG-VLA: Learning Spatially-Grounded Vision-Language-Action Models for Mobile Manipulation
by: Tu, Ruisen, et al.
Published: (2026)
by: Tu, Ruisen, et al.
Published: (2026)
ElegantVLA: Learning When to Think for Efficient Vision-Language-Action Models
by: Li, Ye, et al.
Published: (2026)
by: Li, Ye, et al.
Published: (2026)
LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models
by: Shen, Boyang, et al.
Published: (2026)
by: Shen, Boyang, et al.
Published: (2026)
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
by: Wang, Yihao, et al.
Published: (2025)
by: Wang, Yihao, et al.
Published: (2025)
LLaDA-VLA: Vision Language Diffusion Action Models
by: Wen, Yuqing, et al.
Published: (2025)
by: Wen, Yuqing, et al.
Published: (2025)
ST4VLA: Spatially Guided Training for Vision-Language-Action Models
by: Ye, Jinhui, et al.
Published: (2026)
by: Ye, Jinhui, et al.
Published: (2026)
RLinf-VLA: A Unified and Efficient Framework for Reinforcement Learning of Vision-Language-Action Models
by: Zang, Hongzhi, et al.
Published: (2025)
by: Zang, Hongzhi, et al.
Published: (2025)
ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models
by: Zhong, Linqing, et al.
Published: (2026)
by: Zhong, Linqing, et al.
Published: (2026)
CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
by: Song, Wenxuan, et al.
Published: (2025)
by: Song, Wenxuan, et al.
Published: (2025)
Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models
by: Jin, Ruofan, et al.
Published: (2026)
by: Jin, Ruofan, et al.
Published: (2026)
Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models
by: Bai, Shuanghao, et al.
Published: (2026)
by: Bai, Shuanghao, et al.
Published: (2026)
OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL
by: Jie, Haoxiang, et al.
Published: (2026)
by: Jie, Haoxiang, et al.
Published: (2026)
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
by: Ding, Pengxiang, et al.
Published: (2023)
by: Ding, Pengxiang, et al.
Published: (2023)
dVLA: Diffusion Vision-Language-Action Model with Multimodal Chain-of-Thought
by: Wen, Junjie, et al.
Published: (2025)
by: Wen, Junjie, et al.
Published: (2025)
OpenVLA: An Open-Source Vision-Language-Action Model
by: Kim, Moo Jin, et al.
Published: (2024)
by: Kim, Moo Jin, et al.
Published: (2024)
RationalVLA: A Rational Vision-Language-Action Model with Dual System
by: Song, Wenxuan, et al.
Published: (2025)
by: Song, Wenxuan, et al.
Published: (2025)
Counterfactual VLA: Self-Reflective Vision-Language-Action Model with Adaptive Reasoning
by: Peng, Zhenghao "Mark", et al.
Published: (2025)
by: Peng, Zhenghao "Mark", et al.
Published: (2025)
ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver
by: Song, Wenxuan, et al.
Published: (2025)
by: Song, Wenxuan, et al.
Published: (2025)
X-DiffVLA: X-Embodied Diffusion Action Heads for Vision-Language-Action Models
by: Li, Boyu, et al.
Published: (2026)
by: Li, Boyu, et al.
Published: (2026)
RynnVLA-002: A Unified Vision-Language-Action and World Model
by: Cen, Jun, et al.
Published: (2025)
by: Cen, Jun, et al.
Published: (2025)
VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models
by: Wang, Zixuan, et al.
Published: (2026)
by: Wang, Zixuan, et al.
Published: (2026)
DropVLA: An Action-Level Backdoor Attack on Vision-Language-Action Models
by: Xu, Zonghuan, et al.
Published: (2025)
by: Xu, Zonghuan, et al.
Published: (2025)
TA-VLA: Elucidating the Design Space of Torque-aware Vision-Language-Action Models
by: Zhang, Zongzheng, et al.
Published: (2025)
by: Zhang, Zongzheng, et al.
Published: (2025)
VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation
by: Zhao, Han, et al.
Published: (2025)
by: Zhao, Han, et al.
Published: (2025)
StableVLA: Towards Robust Vision-Language-Action Models without Extra Data
by: Fu, Yiyang, et al.
Published: (2026)
by: Fu, Yiyang, et al.
Published: (2026)
RoboNurse-VLA: Robotic Scrub Nurse System based on Vision-Language-Action Model
by: Li, Shunlei, et al.
Published: (2024)
by: Li, Shunlei, et al.
Published: (2024)
CronusVLA: Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling
by: Li, Hao, et al.
Published: (2025)
by: Li, Hao, et al.
Published: (2025)
MoS-VLA: A Vision-Language-Action Model with One-Shot Skill Adaptation
by: Zhao, Ruihan, et al.
Published: (2025)
by: Zhao, Ruihan, et al.
Published: (2025)
Audio-VLA: Adding Contact Audio Perception to Vision-Language-Action Model for Robotic Manipulation
by: Wei, Xiangyi, et al.
Published: (2025)
by: Wei, Xiangyi, et al.
Published: (2025)
AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models
by: Jiang, Yuhua, et al.
Published: (2025)
by: Jiang, Yuhua, et al.
Published: (2025)
Similar Items
-
When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models
by: Yan, Yuping, et al.
Published: (2025) -
RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models
by: Zhang, Hongyin, et al.
Published: (2025) -
StereoVLA: Enhancing Vision-Language-Action Models with Stereo Vision
by: Deng, Shengliang, et al.
Published: (2025) -
CRL-VLA: Continual Vision-Language-Action Learning
by: Zeng, Qixin, et al.
Published: (2026) -
PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models
by: Guo, Xinyu, et al.
Published: (2026)