Saved in:
| Main Authors: | Rao, Zhifeng, Chen, Wenlong, Xie, Lei, Hua, Xia, Yin, Dongfu, Tian, Zhen, Yu, F. Richard |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.10698 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DepthVLA: Enhancing Vision-Language-Action Models with Depth-Aware Spatial Reasoning
by: Yuan, Tianyuan, et al.
Published: (2025)
by: Yuan, Tianyuan, et al.
Published: (2025)
SaiVLA-0: Cerebrum--Pons--Cerebellum Tripartite Architecture for Compute-Aware Vision-Language-Action
by: Shi, Xiang, et al.
Published: (2026)
by: Shi, Xiang, et al.
Published: (2026)
AIR-VLA: Vision-Language-Action Systems for Aerial Manipulation
by: Sun, Jianli, et al.
Published: (2026)
by: Sun, Jianli, et al.
Published: (2026)
GeoVLA: Empowering 3D Representations in Vision-Language-Action Models
by: Sun, Lin, et al.
Published: (2025)
by: Sun, Lin, et al.
Published: (2025)
EdgeVLA: Efficient Vision-Language-Action Models
by: Budzianowski, Paweł, et al.
Published: (2025)
by: Budzianowski, Paweł, et al.
Published: (2025)
QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models
by: Li, Yixuan, et al.
Published: (2025)
by: Li, Yixuan, et al.
Published: (2025)
3D-VLA: A 3D Vision-Language-Action Generative World Model
by: Zhen, Haoyu, et al.
Published: (2024)
by: Zhen, Haoyu, et al.
Published: (2024)
GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models
by: Sarowar, Md Selim, et al.
Published: (2026)
by: Sarowar, Md Selim, et al.
Published: (2026)
Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance
by: Wang, Songsheng, et al.
Published: (2025)
by: Wang, Songsheng, et al.
Published: (2025)
DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models
by: Yin, Cheng, et al.
Published: (2025)
by: Yin, Cheng, et al.
Published: (2025)
LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models
by: Shen, Boyang, et al.
Published: (2026)
by: Shen, Boyang, et al.
Published: (2026)
AffordVLA: Injecting Affordance Representations into Vision-Language-Action Models via Implicit Feature Alignment
by: Kong, Weijie, et al.
Published: (2026)
by: Kong, Weijie, et al.
Published: (2026)
MAP-VLA: Memory-Augmented Prompting for Vision-Language-Action Model in Robotic Manipulation
by: Li, Runhao, et al.
Published: (2025)
by: Li, Runhao, et al.
Published: (2025)
E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes
by: Zhai, Jiajun, et al.
Published: (2026)
by: Zhai, Jiajun, et al.
Published: (2026)
AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention
by: Xiao, Lei, et al.
Published: (2025)
by: Xiao, Lei, et al.
Published: (2025)
DropVLA: An Action-Level Backdoor Attack on Vision-Language-Action Models
by: Xu, Zonghuan, et al.
Published: (2025)
by: Xu, Zonghuan, et al.
Published: (2025)
PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models
by: Guo, Xinyu, et al.
Published: (2026)
by: Guo, Xinyu, et al.
Published: (2026)
AC^2-VLA: Action-Context-Aware Adaptive Computation in Vision-Language-Action Models for Efficient Robotic Manipulation
by: Yu, Wenda, et al.
Published: (2026)
by: Yu, Wenda, et al.
Published: (2026)
VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning
by: Wang, Chaoyang, et al.
Published: (2026)
by: Wang, Chaoyang, et al.
Published: (2026)
AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation
by: Wu, Yangchao, et al.
Published: (2023)
by: Wu, Yangchao, et al.
Published: (2023)
RLinf-VLA: A Unified and Efficient Framework for Reinforcement Learning of Vision-Language-Action Models
by: Zang, Hongzhi, et al.
Published: (2025)
by: Zang, Hongzhi, et al.
Published: (2025)
RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models
by: Liufu, Weijia, et al.
Published: (2026)
by: Liufu, Weijia, et al.
Published: (2026)
FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models
by: Wang, Xin, et al.
Published: (2025)
by: Wang, Xin, et al.
Published: (2025)
IA-VLA: Input Augmentation for Vision-Language-Action models in settings with semantically complex tasks
by: Hannus, Eric, et al.
Published: (2025)
by: Hannus, Eric, et al.
Published: (2025)
4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration
by: Zhang, Jiahui, et al.
Published: (2025)
by: Zhang, Jiahui, et al.
Published: (2025)
CRL-VLA: Continual Vision-Language-Action Learning
by: Zeng, Qixin, et al.
Published: (2026)
by: Zeng, Qixin, et al.
Published: (2026)
SwitchVLA: Execution-Aware Task Switching for Vision-Language-Action Models
by: Li, Meng, et al.
Published: (2025)
by: Li, Meng, et al.
Published: (2025)
RotVLA: Rotational Latent Action for Vision-Language-Action Model
by: Li, Qiwei, et al.
Published: (2026)
by: Li, Qiwei, et al.
Published: (2026)
ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models
by: Zhong, Linqing, et al.
Published: (2026)
by: Zhong, Linqing, et al.
Published: (2026)
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation
by: Xie, Haozhe, et al.
Published: (2026)
by: Xie, Haozhe, et al.
Published: (2026)
PointVLA: Injecting the 3D World into Vision-Language-Action Models
by: Li, Chengmeng, et al.
Published: (2025)
by: Li, Chengmeng, et al.
Published: (2025)
ROI-Driven Foveated Attention for Unified Egocentric Representations in Vision-Language-Action Systems
by: Sun, Xinhai, et al.
Published: (2026)
by: Sun, Xinhai, et al.
Published: (2026)
VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching
by: Xu, Siyu, et al.
Published: (2025)
by: Xu, Siyu, et al.
Published: (2025)
StyleVLA: Driving Style-Aware Vision Language Action Model for Autonomous Driving
by: Gao, Yuan, et al.
Published: (2026)
by: Gao, Yuan, et al.
Published: (2026)
ElegantVLA: Learning When to Think for Efficient Vision-Language-Action Models
by: Li, Ye, et al.
Published: (2026)
by: Li, Ye, et al.
Published: (2026)
CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving
by: Arai, Hidehisa, et al.
Published: (2024)
by: Arai, Hidehisa, et al.
Published: (2024)
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
by: Liang, Zhixuan, et al.
Published: (2025)
by: Liang, Zhixuan, et al.
Published: (2025)
StereoVLA: Enhancing Vision-Language-Action Models with Stereo Vision
by: Deng, Shengliang, et al.
Published: (2025)
by: Deng, Shengliang, et al.
Published: (2025)
CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies
by: Du, Fan, et al.
Published: (2026)
by: Du, Fan, et al.
Published: (2026)
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
by: Yang, Shuai, et al.
Published: (2025)
by: Yang, Shuai, et al.
Published: (2025)
Similar Items
-
DepthVLA: Enhancing Vision-Language-Action Models with Depth-Aware Spatial Reasoning
by: Yuan, Tianyuan, et al.
Published: (2025) -
SaiVLA-0: Cerebrum--Pons--Cerebellum Tripartite Architecture for Compute-Aware Vision-Language-Action
by: Shi, Xiang, et al.
Published: (2026) -
AIR-VLA: Vision-Language-Action Systems for Aerial Manipulation
by: Sun, Jianli, et al.
Published: (2026) -
GeoVLA: Empowering 3D Representations in Vision-Language-Action Models
by: Sun, Lin, et al.
Published: (2025) -
EdgeVLA: Efficient Vision-Language-Action Models
by: Budzianowski, Paweł, et al.
Published: (2025)