Saved in:
| Main Authors: | Zhou, Zewei, Yang, Ruining, Xuewei, Qi, Guo, Yiluan, Chen, Sherry X., Feng, Tao, Pistunova, Kateryna, Shen, Yishan, Su, Lili, Ma, Jiaqi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.19710 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
nuReasoning: A Reasoning-Centric Dataset and Benchmark for Long-Tail Autonomous Driving
by: Huang, Zhiyu, et al.
Published: (2026)
by: Huang, Zhiyu, et al.
Published: (2026)
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
by: Zhou, Zewei, et al.
Published: (2025)
by: Zhou, Zewei, et al.
Published: (2025)
Bridging the Semantic-Action Gap in Visual Token Pruning for Efficient VLA Inference
by: Liu, Ziyan, et al.
Published: (2025)
by: Liu, Ziyan, et al.
Published: (2025)
EdgeVLA: Efficient Vision-Language-Action Models
by: Budzianowski, Paweł, et al.
Published: (2025)
by: Budzianowski, Paweł, et al.
Published: (2025)
CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies
by: Du, Fan, et al.
Published: (2026)
by: Du, Fan, et al.
Published: (2026)
ElegantVLA: Learning When to Think for Efficient Vision-Language-Action Models
by: Li, Ye, et al.
Published: (2026)
by: Li, Ye, et al.
Published: (2026)
RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models
by: Liufu, Weijia, et al.
Published: (2026)
by: Liufu, Weijia, et al.
Published: (2026)
VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching
by: Xu, Siyu, et al.
Published: (2025)
by: Xu, Siyu, et al.
Published: (2025)
StereoVLA: Enhancing Vision-Language-Action Models with Stereo Vision
by: Deng, Shengliang, et al.
Published: (2025)
by: Deng, Shengliang, et al.
Published: (2025)
RotVLA: Rotational Latent Action for Vision-Language-Action Model
by: Li, Qiwei, et al.
Published: (2026)
by: Li, Qiwei, et al.
Published: (2026)
ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models
by: Zhong, Linqing, et al.
Published: (2026)
by: Zhong, Linqing, et al.
Published: (2026)
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games
by: Chen, Peng, et al.
Published: (2025)
by: Chen, Peng, et al.
Published: (2025)
Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models
by: Jin, Ruofan, et al.
Published: (2026)
by: Jin, Ruofan, et al.
Published: (2026)
AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models
by: Jiang, Yuhua, et al.
Published: (2025)
by: Jiang, Yuhua, et al.
Published: (2025)
AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models
by: Hu, Yutong, et al.
Published: (2026)
by: Hu, Yutong, et al.
Published: (2026)
DropVLA: An Action-Level Backdoor Attack on Vision-Language-Action Models
by: Xu, Zonghuan, et al.
Published: (2025)
by: Xu, Zonghuan, et al.
Published: (2025)
FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models
by: Wang, Xin, et al.
Published: (2025)
by: Wang, Xin, et al.
Published: (2025)
RedVLA: Physical Red Teaming for Vision-Language-Action Models
by: Zhang, Yuhao, et al.
Published: (2026)
by: Zhang, Yuhao, et al.
Published: (2026)
Pure Vision Language Action (VLA) Models: A Comprehensive Survey
by: Zhang, Dapeng, et al.
Published: (2025)
by: Zhang, Dapeng, et al.
Published: (2025)
UrbanVLA: A Vision-Language-Action Model for Urban Micromobility
by: Li, Anqi, et al.
Published: (2025)
by: Li, Anqi, et al.
Published: (2025)
HyperVLA: Efficient Inference in Vision-Language-Action Models via Hypernetworks
by: Xiong, Zheng, et al.
Published: (2025)
by: Xiong, Zheng, et al.
Published: (2025)
EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models
by: Yang, Yantai, et al.
Published: (2025)
by: Yang, Yantai, et al.
Published: (2025)
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
by: Shukor, Mustafa, et al.
Published: (2025)
by: Shukor, Mustafa, et al.
Published: (2025)
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation
by: Wen, Junjie, et al.
Published: (2024)
by: Wen, Junjie, et al.
Published: (2024)
CRL-VLA: Continual Vision-Language-Action Learning
by: Zeng, Qixin, et al.
Published: (2026)
by: Zeng, Qixin, et al.
Published: (2026)
VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation
by: Dong, Shaoqi, et al.
Published: (2025)
by: Dong, Shaoqi, et al.
Published: (2025)
AC^2-VLA: Action-Context-Aware Adaptive Computation in Vision-Language-Action Models for Efficient Robotic Manipulation
by: Yu, Wenda, et al.
Published: (2026)
by: Yu, Wenda, et al.
Published: (2026)
LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models
by: Shen, Boyang, et al.
Published: (2026)
by: Shen, Boyang, et al.
Published: (2026)
TIC-VLA: A Think-in-Control Vision-Language-Action Model for Robot Navigation in Dynamic Environments
by: Huang, Zhiyu, et al.
Published: (2026)
by: Huang, Zhiyu, et al.
Published: (2026)
VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models
by: Zhang, Borong, et al.
Published: (2025)
by: Zhang, Borong, et al.
Published: (2025)
VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model
by: Sun, Jingwen, et al.
Published: (2026)
by: Sun, Jingwen, et al.
Published: (2026)
VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments
by: Wu, Yuze, et al.
Published: (2025)
by: Wu, Yuze, et al.
Published: (2025)
Lite VLA: Efficient Vision-Language-Action Control on CPU-Bound Edge Robots
by: Williams, Justin, et al.
Published: (2025)
by: Williams, Justin, et al.
Published: (2025)
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
by: Liang, Zhixuan, et al.
Published: (2025)
by: Liang, Zhixuan, et al.
Published: (2025)
X-DiffVLA: X-Embodied Diffusion Action Heads for Vision-Language-Action Models
by: Li, Boyu, et al.
Published: (2026)
by: Li, Boyu, et al.
Published: (2026)
DepthVLA: Enhancing Vision-Language-Action Models with Depth-Aware Spatial Reasoning
by: Yuan, Tianyuan, et al.
Published: (2025)
by: Yuan, Tianyuan, et al.
Published: (2025)
OpenVLA: An Open-Source Vision-Language-Action Model
by: Kim, Moo Jin, et al.
Published: (2024)
by: Kim, Moo Jin, et al.
Published: (2024)
QuoVLA: Quotient Space for Vision-Language-Action Models
by: Wang, Xuan, et al.
Published: (2026)
by: Wang, Xuan, et al.
Published: (2026)
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
by: Ding, Pengxiang, et al.
Published: (2023)
by: Ding, Pengxiang, et al.
Published: (2023)
LLaDA-VLA: Vision Language Diffusion Action Models
by: Wen, Yuqing, et al.
Published: (2025)
by: Wen, Yuqing, et al.
Published: (2025)
Similar Items
-
nuReasoning: A Reasoning-Centric Dataset and Benchmark for Long-Tail Autonomous Driving
by: Huang, Zhiyu, et al.
Published: (2026) -
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
by: Zhou, Zewei, et al.
Published: (2025) -
Bridging the Semantic-Action Gap in Visual Token Pruning for Efficient VLA Inference
by: Liu, Ziyan, et al.
Published: (2025) -
EdgeVLA: Efficient Vision-Language-Action Models
by: Budzianowski, Paweł, et al.
Published: (2025) -
CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies
by: Du, Fan, et al.
Published: (2026)