Saved in:
| Main Authors: | Zhang, Hanxin, Xu, Mingshuo, Dhafer, Abdulqader, Yue, Shigang, Dong, Hongbiao, Hao, Zhou Daniel |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.00321 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Generative System for Robot-to-Human Handovers: from Intent Inference to Spatial Configuration Imagery
by: Zhang, Hanxin, et al.
Published: (2025)
by: Zhang, Hanxin, et al.
Published: (2025)
Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models
by: Xu, Haiweng, et al.
Published: (2026)
by: Xu, Haiweng, et al.
Published: (2026)
Survey of Vision-Language-Action Models for Embodied Manipulation
by: Li, Haoran, et al.
Published: (2025)
by: Li, Haoran, et al.
Published: (2025)
X-DiffVLA: X-Embodied Diffusion Action Heads for Vision-Language-Action Models
by: Li, Boyu, et al.
Published: (2026)
by: Li, Boyu, et al.
Published: (2026)
DM0: An Embodied-Native Vision-Language-Action Model towards Physical AI
by: Yu, En, et al.
Published: (2026)
by: Yu, En, et al.
Published: (2026)
Agentic Robot: A Brain-Inspired Framework for Vision-Language-Action Models in Embodied Agents
by: Yang, Zhejian, et al.
Published: (2025)
by: Yang, Zhejian, et al.
Published: (2025)
VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model
by: Sun, Jingwen, et al.
Published: (2026)
by: Sun, Jingwen, et al.
Published: (2026)
HALO: A Unified Vision-Language-Action Model for Embodied Multimodal Chain-of-Thought Reasoning
by: Shou, Quanxin, et al.
Published: (2026)
by: Shou, Quanxin, et al.
Published: (2026)
Embodied Learning of Reward for Musculoskeletal Control with Vision Language Models
by: Soedarmadji, Saraswati, et al.
Published: (2025)
by: Soedarmadji, Saraswati, et al.
Published: (2025)
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions
by: Lv, Qi, et al.
Published: (2025)
by: Lv, Qi, et al.
Published: (2025)
Embodied Scene Understanding for Vision Language Models via MetaVQA
by: Wang, Weizhen, et al.
Published: (2025)
by: Wang, Weizhen, et al.
Published: (2025)
MEM: Multi-Scale Embodied Memory for Vision Language Action Models
by: Torne, Marcel, et al.
Published: (2026)
by: Torne, Marcel, et al.
Published: (2026)
Guide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Models
by: Ling, Yiran, et al.
Published: (2026)
by: Ling, Yiran, et al.
Published: (2026)
Embodied3DBench: Benchmarking Low-Level Embodied Spatial Intelligence of Vision Language Models
by: Zhang, Jiyao, et al.
Published: (2026)
by: Zhang, Jiyao, et al.
Published: (2026)
Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control
by: Chen, William, et al.
Published: (2026)
by: Chen, William, et al.
Published: (2026)
Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey
by: Guan, Weifan, et al.
Published: (2025)
by: Guan, Weifan, et al.
Published: (2025)
TA-VLA: Elucidating the Design Space of Torque-aware Vision-Language-Action Models
by: Zhang, Zongzheng, et al.
Published: (2025)
by: Zhang, Zongzheng, et al.
Published: (2025)
Uni-LaViRA: Language-Vision-Robot Actions Translation for Unified Embodied Navigation
by: Ding, Hongyu, et al.
Published: (2026)
by: Ding, Hongyu, et al.
Published: (2026)
A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter
by: Xu, Kechun, et al.
Published: (2023)
by: Xu, Kechun, et al.
Published: (2023)
Toward Embodiment Equivariant Vision-Language-Action Policy
by: Chen, Anzhe, et al.
Published: (2025)
by: Chen, Anzhe, et al.
Published: (2025)
ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge
by: Zhou, Zhongyi, et al.
Published: (2025)
by: Zhou, Zhongyi, et al.
Published: (2025)
CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
by: Li, Xiaoqi, et al.
Published: (2025)
by: Li, Xiaoqi, et al.
Published: (2025)
Understanding the Impact of Geometric Foundation Models on Vision-Language-Action Models
by: Yang, Yurou, et al.
Published: (2026)
by: Yang, Yurou, et al.
Published: (2026)
DyQ-VLA: Temporal-Dynamic-Aware Quantization for Embodied Vision-Language-Action Models
by: Zheng, Zihao, et al.
Published: (2026)
by: Zheng, Zihao, et al.
Published: (2026)
UAV-Track VLA: Embodied Aerial Tracking via Vision-Language-Action Models
by: Zhang, Qiyao, et al.
Published: (2026)
by: Zhang, Qiyao, et al.
Published: (2026)
Stable Language Guidance for Vision-Language-Action Models
by: Zhan, Zhihao, et al.
Published: (2026)
by: Zhan, Zhihao, et al.
Published: (2026)
AT-VLA: Adaptive Tactile Injection for Enhanced Feedback Reaction in Vision-Language-Action Models
by: Li, Xiaoqi, et al.
Published: (2026)
by: Li, Xiaoqi, et al.
Published: (2026)
InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning
by: Zhang, Ji, et al.
Published: (2025)
by: Zhang, Ji, et al.
Published: (2025)
HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness
by: Zheng, Zihao, et al.
Published: (2026)
by: Zheng, Zihao, et al.
Published: (2026)
Action Hallucination in Generative Vision-Language-Action Models
by: Soh, Harold, et al.
Published: (2026)
by: Soh, Harold, et al.
Published: (2026)
RealMirror: A Comprehensive, Open-Source Vision-Language-Action Platform for Embodied AI
by: Tai, Cong, et al.
Published: (2025)
by: Tai, Cong, et al.
Published: (2025)
HMR-1: Hierarchical Massage Robot with Vision-Language-Model for Embodied Healthcare
by: Xu, Rongtao, et al.
Published: (2026)
by: Xu, Rongtao, et al.
Published: (2026)
From Inference Efficiency to Embodied Efficiency: Revisiting Efficiency Metrics for Vision-Language-Action Models
by: Li, Zhuofan, et al.
Published: (2026)
by: Li, Zhuofan, et al.
Published: (2026)
Towards Long-horizon Embodied Agents with Tool-Aligned Vision-Language-Action Models
by: Lei, Zixing, et al.
Published: (2026)
by: Lei, Zixing, et al.
Published: (2026)
vSTMD: Visual Motion Detection for Extremely Tiny Target at Various Velocities
by: Xu, Mingshuo, et al.
Published: (2025)
by: Xu, Mingshuo, et al.
Published: (2025)
VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation
by: Zhang, Chaofan, et al.
Published: (2025)
by: Zhang, Chaofan, et al.
Published: (2025)
MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation
by: Liu, Zhuoyang, et al.
Published: (2025)
by: Liu, Zhuoyang, et al.
Published: (2025)
Grounding Sim-to-Real Generalization in Dexterous Manipulation: An Empirical Study with Vision-Language-Action Models
by: Jin, Ruixing, et al.
Published: (2026)
by: Jin, Ruixing, et al.
Published: (2026)
Reshaping Action Error Distributions for Reliable Vision-Language-Action Models
by: Bai, Shuanghao, et al.
Published: (2026)
by: Bai, Shuanghao, et al.
Published: (2026)
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
by: Liang, Wenqi, et al.
Published: (2025)
by: Liang, Wenqi, et al.
Published: (2025)
Similar Items
-
A Generative System for Robot-to-Human Handovers: from Intent Inference to Spatial Configuration Imagery
by: Zhang, Hanxin, et al.
Published: (2025) -
Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models
by: Xu, Haiweng, et al.
Published: (2026) -
Survey of Vision-Language-Action Models for Embodied Manipulation
by: Li, Haoran, et al.
Published: (2025) -
X-DiffVLA: X-Embodied Diffusion Action Heads for Vision-Language-Action Models
by: Li, Boyu, et al.
Published: (2026) -
DM0: An Embodied-Native Vision-Language-Action Model towards Physical AI
by: Yu, En, et al.
Published: (2026)