Saved in:
| Main Authors: | Xu, Peng, Deng, Zhengnan, Deng, Jiayan, Gu, Zonghua, Wan, Shaohua |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.14363 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
UAV-Track VLA: Embodied Aerial Tracking via Vision-Language-Action Models
by: Zhang, Qiyao, et al.
Published: (2026)
by: Zhang, Qiyao, et al.
Published: (2026)
UAV-VLN: End-to-End Vision Language guided Navigation for UAVs
by: Saxena, Pranav, et al.
Published: (2025)
by: Saxena, Pranav, et al.
Published: (2025)
HiST-VLA: A Hierarchical Spatio-Temporal Vision-Language-Action Model for End-to-End Autonomous Driving
by: Wang, Yiru, et al.
Published: (2026)
by: Wang, Yiru, et al.
Published: (2026)
UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation
by: Sautenkov, Oleg, et al.
Published: (2025)
by: Sautenkov, Oleg, et al.
Published: (2025)
DIAL: Decoupling Intent and Action via Latent World Modeling for End-to-End VLA
by: Chen, Yi, et al.
Published: (2026)
by: Chen, Yi, et al.
Published: (2026)
MAP-VLA: Memory-Augmented Prompting for Vision-Language-Action Model in Robotic Manipulation
by: Li, Runhao, et al.
Published: (2025)
by: Li, Runhao, et al.
Published: (2025)
LLaDA-VLA: Vision Language Diffusion Action Models
by: Wen, Yuqing, et al.
Published: (2025)
by: Wen, Yuqing, et al.
Published: (2025)
WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation
by: Zhao, Baining, et al.
Published: (2026)
by: Zhao, Baining, et al.
Published: (2026)
End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering
by: Goetting, Dylan, et al.
Published: (2024)
by: Goetting, Dylan, et al.
Published: (2024)
ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration
by: Zhu, Minjie, et al.
Published: (2025)
by: Zhu, Minjie, et al.
Published: (2025)
Fast-SmartWay: Panoramic-Free End-to-End Zero-Shot Vision-and-Language Navigation
by: Shi, Xiangyu, et al.
Published: (2025)
by: Shi, Xiangyu, et al.
Published: (2025)
StableVLA: Towards Robust Vision-Language-Action Models without Extra Data
by: Fu, Yiyang, et al.
Published: (2026)
by: Fu, Yiyang, et al.
Published: (2026)
Cognitive-Hierarchy Guided End-to-End Planning for Autonomous Driving
by: Wang, Zhennan, et al.
Published: (2025)
by: Wang, Zhennan, et al.
Published: (2025)
ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model
by: Zhou, Zhongyi, et al.
Published: (2025)
by: Zhou, Zhongyi, et al.
Published: (2025)
AutoMoT: A Unified Vision-Language-Action Model with Asynchronous Mixture-of-Transformers for End-to-End Autonomous Driving
by: Huang, Wenhui, et al.
Published: (2026)
by: Huang, Wenhui, et al.
Published: (2026)
UAV-ON: A Benchmark for Open-World Object Goal Navigation with Aerial Agents
by: Xiao, Jianqiang, et al.
Published: (2025)
by: Xiao, Jianqiang, et al.
Published: (2025)
RotVLA: Rotational Latent Action for Vision-Language-Action Model
by: Li, Qiwei, et al.
Published: (2026)
by: Li, Qiwei, et al.
Published: (2026)
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving
by: Yang, Zhenjie, et al.
Published: (2025)
by: Yang, Zhenjie, et al.
Published: (2025)
MiVLA: Towards Generalizable Vision-Language-Action Model with Human-Robot Mutual Imitation Pre-training
by: Yin, Zhenhan, et al.
Published: (2025)
by: Yin, Zhenhan, et al.
Published: (2025)
CoReVLA: A Dual-Stage End-to-End Autonomous Driving Framework for Long-Tail Scenarios via Collect-and-Refine
by: Fang, Shiyu, et al.
Published: (2025)
by: Fang, Shiyu, et al.
Published: (2025)
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
by: Jiang, Bo, et al.
Published: (2024)
by: Jiang, Bo, et al.
Published: (2024)
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
by: Ding, Pengxiang, et al.
Published: (2023)
by: Ding, Pengxiang, et al.
Published: (2023)
PointVLA: Injecting the 3D World into Vision-Language-Action Models
by: Li, Chengmeng, et al.
Published: (2025)
by: Li, Chengmeng, et al.
Published: (2025)
PD-VLA: Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding
by: Song, Wenxuan, et al.
Published: (2025)
by: Song, Wenxuan, et al.
Published: (2025)
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers
by: Wang, Yating, et al.
Published: (2025)
by: Wang, Yating, et al.
Published: (2025)
IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model
by: Jiang, Anqing, et al.
Published: (2025)
by: Jiang, Anqing, et al.
Published: (2025)
dVLA: Diffusion Vision-Language-Action Model with Multimodal Chain-of-Thought
by: Wen, Junjie, et al.
Published: (2025)
by: Wen, Junjie, et al.
Published: (2025)
GesVLA: Gesture-Aware Vision-Language-Action Model Embedded Representations
by: Guo, Wenxuan, et al.
Published: (2026)
by: Guo, Wenxuan, et al.
Published: (2026)
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation
by: Wen, Junjie, et al.
Published: (2024)
by: Wen, Junjie, et al.
Published: (2024)
StreamingVLA: Streaming Vision-Language-Action Model with Action Flow Matching and Adaptive Early Observation
by: Shi, Yiran, et al.
Published: (2026)
by: Shi, Yiran, et al.
Published: (2026)
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
by: Liu, Jiaming, et al.
Published: (2025)
by: Liu, Jiaming, et al.
Published: (2025)
DiffVLA++: Bridging Cognitive Reasoning and End-to-End Driving through Metric-Guided Alignment
by: Gao, Yu, et al.
Published: (2025)
by: Gao, Yu, et al.
Published: (2025)
Action Images: End-to-End Policy Learning via Multiview Video Generation
by: Zhen, Haoyu, et al.
Published: (2026)
by: Zhen, Haoyu, et al.
Published: (2026)
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
by: Ye, Angen, et al.
Published: (2025)
by: Ye, Angen, et al.
Published: (2025)
VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model
by: Sun, Jingwen, et al.
Published: (2026)
by: Sun, Jingwen, et al.
Published: (2026)
SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning
by: Wang, Hanzhen, et al.
Published: (2025)
by: Wang, Hanzhen, et al.
Published: (2025)
Latent-WAM: Latent World Action Modeling for End-to-End Autonomous Driving
by: Wang, Linbo, et al.
Published: (2026)
by: Wang, Linbo, et al.
Published: (2026)
VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching
by: Xu, Siyu, et al.
Published: (2025)
by: Xu, Siyu, et al.
Published: (2025)
DualCoT-VLA: Visual-Linguistic Chain of Thought via Parallel Reasoning for Vision-Language-Action Models
by: Zhong, Zhide, et al.
Published: (2026)
by: Zhong, Zhide, et al.
Published: (2026)
ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver
by: Song, Wenxuan, et al.
Published: (2025)
by: Song, Wenxuan, et al.
Published: (2025)
Similar Items
-
UAV-Track VLA: Embodied Aerial Tracking via Vision-Language-Action Models
by: Zhang, Qiyao, et al.
Published: (2026) -
UAV-VLN: End-to-End Vision Language guided Navigation for UAVs
by: Saxena, Pranav, et al.
Published: (2025) -
HiST-VLA: A Hierarchical Spatio-Temporal Vision-Language-Action Model for End-to-End Autonomous Driving
by: Wang, Yiru, et al.
Published: (2026) -
UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation
by: Sautenkov, Oleg, et al.
Published: (2025) -
DIAL: Decoupling Intent and Action via Latent World Modeling for End-to-End VLA
by: Chen, Yi, et al.
Published: (2026)