Saved in:
| Main Authors: | Wang, Guodong, Zhang, Chenkai, Liu, Qingjie, Zhang, Jinjin, Cai, Jiancheng, Liu, Junjie, Liu, Xinmin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.06556 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
by: Fei, Senyu, et al.
Published: (2025)
by: Fei, Senyu, et al.
Published: (2025)
On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations
by: Guo, Jianing, et al.
Published: (2025)
by: Guo, Jianing, et al.
Published: (2025)
LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization
by: Zhou, Xueyang, et al.
Published: (2025)
by: Zhou, Xueyang, et al.
Published: (2025)
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
by: Zheng, Jinliang, et al.
Published: (2025)
by: Zheng, Jinliang, et al.
Published: (2025)
ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models
by: Tang, Zuojin, et al.
Published: (2026)
by: Tang, Zuojin, et al.
Published: (2026)
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors
by: Zhang, Zhengshen, et al.
Published: (2025)
by: Zhang, Zhengshen, et al.
Published: (2025)
UrbanVLA: A Vision-Language-Action Model for Urban Micromobility
by: Li, Anqi, et al.
Published: (2025)
by: Li, Anqi, et al.
Published: (2025)
Towards Long-horizon Embodied Agents with Tool-Aligned Vision-Language-Action Models
by: Lei, Zixing, et al.
Published: (2026)
by: Lei, Zixing, et al.
Published: (2026)
Explainable Adversarial-Robust Vision-Language-Action Model for Robotic Manipulation
by: Kim, Ju-Young, et al.
Published: (2025)
by: Kim, Ju-Young, et al.
Published: (2025)
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models
by: Gao, Chongkai, et al.
Published: (2025)
by: Gao, Chongkai, et al.
Published: (2025)
ROSA: Harnessing Robot States for Vision-Language and Action Alignment
by: Wen, Yuqing, et al.
Published: (2025)
by: Wen, Yuqing, et al.
Published: (2025)
ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge
by: Zhou, Zhongyi, et al.
Published: (2025)
by: Zhou, Zhongyi, et al.
Published: (2025)
A Survey on Efficient Vision-Language-Action Models
by: Yu, Zhaoshu, et al.
Published: (2025)
by: Yu, Zhaoshu, et al.
Published: (2025)
StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems
by: Ye, Jinhui, et al.
Published: (2026)
by: Ye, Jinhui, et al.
Published: (2026)
Universal Actions for Enhanced Embodied Foundation Models
by: Zheng, Jinliang, et al.
Published: (2025)
by: Zheng, Jinliang, et al.
Published: (2025)
Towards the Vision-Sound-Language-Action Paradigm: The HEAR Framework for Sound-Centric Manipulation
by: Nie, Chang, et al.
Published: (2026)
by: Nie, Chang, et al.
Published: (2026)
PEAfowl: Perception-Enhanced Multi-View Vision-Language-Action for Bimanual Manipulation
by: Fan, Qingyu, et al.
Published: (2026)
by: Fan, Qingyu, et al.
Published: (2026)
A Survey on Vision-Language-Action Models for Autonomous Driving
by: Jiang, Sicong, et al.
Published: (2025)
by: Jiang, Sicong, et al.
Published: (2025)
LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models
by: Shen, Boyang, et al.
Published: (2026)
by: Shen, Boyang, et al.
Published: (2026)
TTF-VLA: Temporal Token Fusion via Pixel-Attention Integration for Vision-Language-Action Models
by: Liu, Chenghao, et al.
Published: (2025)
by: Liu, Chenghao, et al.
Published: (2025)
NEBULA: Do We Evaluate Vision-Language-Action Agents Correctly?
by: Peng, Jierui, et al.
Published: (2025)
by: Peng, Jierui, et al.
Published: (2025)
SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning
by: Wang, Hanzhen, et al.
Published: (2025)
by: Wang, Hanzhen, et al.
Published: (2025)
IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model
by: Jiang, Anqing, et al.
Published: (2025)
by: Jiang, Anqing, et al.
Published: (2025)
VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning
by: Wang, Chaoyang, et al.
Published: (2026)
by: Wang, Chaoyang, et al.
Published: (2026)
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
by: Zhao, Qingqing, et al.
Published: (2025)
by: Zhao, Qingqing, et al.
Published: (2025)
A Large Vision-Language Model based Environment Perception System for Visually Impaired People
by: Chen, Zezhou, et al.
Published: (2025)
by: Chen, Zezhou, et al.
Published: (2025)
VITA: Vision-to-Action Flow Matching Policy
by: Gao, Dechen, et al.
Published: (2025)
by: Gao, Dechen, et al.
Published: (2025)
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
by: Chen, Xinyi, et al.
Published: (2025)
by: Chen, Xinyi, et al.
Published: (2025)
VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting
by: Lin, Juyi, et al.
Published: (2025)
by: Lin, Juyi, et al.
Published: (2025)
Hybrid Training for Vision-Language-Action Models
by: Mazzaglia, Pietro, et al.
Published: (2025)
by: Mazzaglia, Pietro, et al.
Published: (2025)
VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models
by: Si, Shengyu, et al.
Published: (2026)
by: Si, Shengyu, et al.
Published: (2026)
StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing
by: Community, StarVLA
Published: (2026)
by: Community, StarVLA
Published: (2026)
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
by: Yang, Ruihan, et al.
Published: (2025)
by: Yang, Ruihan, et al.
Published: (2025)
HiST-VLA: A Hierarchical Spatio-Temporal Vision-Language-Action Model for End-to-End Autonomous Driving
by: Wang, Yiru, et al.
Published: (2026)
by: Wang, Yiru, et al.
Published: (2026)
Recognizing Actions from Robotic View for Natural Human-Robot Interaction
by: Wang, Ziyi, et al.
Published: (2025)
by: Wang, Ziyi, et al.
Published: (2025)
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
by: Hung, Chia-Yu, et al.
Published: (2025)
by: Hung, Chia-Yu, et al.
Published: (2025)
OG-VLA: Orthographic Image Generation for 3D-Aware Vision-Language Action Model
by: Singh, Ishika, et al.
Published: (2025)
by: Singh, Ishika, et al.
Published: (2025)
Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts
by: Wang, Yun, et al.
Published: (2025)
by: Wang, Yun, et al.
Published: (2025)
VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model
by: Wang, Beichen, et al.
Published: (2024)
by: Wang, Beichen, et al.
Published: (2024)
Interactive Post-Training for Vision-Language-Action Models
by: Tan, Shuhan, et al.
Published: (2025)
by: Tan, Shuhan, et al.
Published: (2025)
Similar Items
-
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
by: Fei, Senyu, et al.
Published: (2025) -
On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations
by: Guo, Jianing, et al.
Published: (2025) -
LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization
by: Zhou, Xueyang, et al.
Published: (2025) -
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
by: Zheng, Jinliang, et al.
Published: (2025) -
ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models
by: Tang, Zuojin, et al.
Published: (2026)