Saved in:
| Main Authors: | Sun, Xinhai, Shi, Xiang, Zou, Menglin, Huang, Wenlong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.20668 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SaiVLA-0: Cerebrum--Pons--Cerebellum Tripartite Architecture for Compute-Aware Vision-Language-Action
by: Shi, Xiang, et al.
Published: (2026)
by: Shi, Xiang, et al.
Published: (2026)
RDGen: Demonstration Generation for High-Quality Robot Learning via Reinforcement Learning
by: Zhu, Zijian, et al.
Published: (2026)
by: Zhu, Zijian, et al.
Published: (2026)
Cross-Hand Latent Representation for Vision-Language-Action Models
by: Jiang, Guangqi, et al.
Published: (2026)
by: Jiang, Guangqi, et al.
Published: (2026)
Reading in the Dark with Foveated Event Vision
by: Brander, Carl, et al.
Published: (2025)
by: Brander, Carl, et al.
Published: (2025)
Developing Vision-Language-Action Model from Egocentric Videos
by: Yoshida, Tomoya, et al.
Published: (2025)
by: Yoshida, Tomoya, et al.
Published: (2025)
GeoVLA: Empowering 3D Representations in Vision-Language-Action Models
by: Sun, Lin, et al.
Published: (2025)
by: Sun, Lin, et al.
Published: (2025)
XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations
by: Fan, Shichao, et al.
Published: (2025)
by: Fan, Shichao, et al.
Published: (2025)
HARP-VLA: Human-Robot Aligned Representation Learning for Vision-Language-Action Model
by: Zhu, Xiang, et al.
Published: (2026)
by: Zhu, Xiang, et al.
Published: (2026)
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
by: Yang, Ruihan, et al.
Published: (2025)
by: Yang, Ruihan, et al.
Published: (2025)
Unified Vision-Language-Action Model
by: Wang, Yuqi, et al.
Published: (2025)
by: Wang, Yuqi, et al.
Published: (2025)
HALO: A Unified Vision-Language-Action Model for Embodied Multimodal Chain-of-Thought Reasoning
by: Shou, Quanxin, et al.
Published: (2026)
by: Shou, Quanxin, et al.
Published: (2026)
Look, Focus, Act: Efficient and Robust Robot Learning via Human Gaze and Foveated Vision Transformers
by: Chuang, Ian, et al.
Published: (2025)
by: Chuang, Ian, et al.
Published: (2025)
METIS: Multi-Source Egocentric Training for Integrated Dexterous Vision-Language-Action Model
by: Fu, Yankai, et al.
Published: (2025)
by: Fu, Yankai, et al.
Published: (2025)
MoTVLA: A Vision-Language-Action Model with Unified Fast-Slow Reasoning
by: Huang, Wenhui, et al.
Published: (2025)
by: Huang, Wenhui, et al.
Published: (2025)
RynnVLA-002: A Unified Vision-Language-Action and World Model
by: Cen, Jun, et al.
Published: (2025)
by: Cen, Jun, et al.
Published: (2025)
AIR-VLA: Vision-Language-Action Systems for Aerial Manipulation
by: Sun, Jianli, et al.
Published: (2026)
by: Sun, Jianli, et al.
Published: (2026)
Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms
by: Li, Qi, et al.
Published: (2026)
by: Li, Qi, et al.
Published: (2026)
MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation
by: Liu, Yang, et al.
Published: (2026)
by: Liu, Yang, et al.
Published: (2026)
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence
by: Lin, Xiaopeng, et al.
Published: (2025)
by: Lin, Xiaopeng, et al.
Published: (2025)
RLinf-VLA: A Unified and Efficient Framework for Reinforcement Learning of Vision-Language-Action Models
by: Zang, Hongzhi, et al.
Published: (2025)
by: Zang, Hongzhi, et al.
Published: (2025)
BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models
by: Ma, Xiaoyu, et al.
Published: (2025)
by: Ma, Xiaoyu, et al.
Published: (2025)
VLA-REPLICA: A Low-Cost, Reproducible Benchmark for Real-World Evaluation of Vision-Language-Action Models
by: Huang, Alex S., et al.
Published: (2026)
by: Huang, Alex S., et al.
Published: (2026)
Geometry-Aware Sparse Depth Sampling for High-Fidelity RGB-D Depth Completion in Robotic Systems
by: Salloom, Tony, et al.
Published: (2025)
by: Salloom, Tony, et al.
Published: (2025)
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks
by: Yang, Yi, et al.
Published: (2025)
by: Yang, Yi, et al.
Published: (2025)
Dexbotic: Open-Source Vision-Language-Action Toolbox
by: Xie, Bin, et al.
Published: (2025)
by: Xie, Bin, et al.
Published: (2025)
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
by: Lin, Fanqi, et al.
Published: (2025)
by: Lin, Fanqi, et al.
Published: (2025)
CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
by: Li, Xiaoqi, et al.
Published: (2025)
by: Li, Xiaoqi, et al.
Published: (2025)
Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance
by: Zhang, Yang, et al.
Published: (2025)
by: Zhang, Yang, et al.
Published: (2025)
LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation
by: Wang, Zhijie, et al.
Published: (2024)
by: Wang, Zhijie, et al.
Published: (2024)
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
by: Yu, Qiaojun, et al.
Published: (2024)
by: Yu, Qiaojun, et al.
Published: (2024)
DyQ-VLA: Temporal-Dynamic-Aware Quantization for Embodied Vision-Language-Action Models
by: Zheng, Zihao, et al.
Published: (2026)
by: Zheng, Zihao, et al.
Published: (2026)
HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models
by: Lin, Minghui, et al.
Published: (2025)
by: Lin, Minghui, et al.
Published: (2025)
A Survey on Vision-Language-Action Models for Autonomous Driving
by: Jiang, Sicong, et al.
Published: (2025)
by: Jiang, Sicong, et al.
Published: (2025)
Action-aware Dynamic Pruning for Efficient Vision-Language-Action Manipulation
by: Pei, Xiaohuan, et al.
Published: (2025)
by: Pei, Xiaohuan, et al.
Published: (2025)
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
by: Zhong, Yifan, et al.
Published: (2025)
by: Zhong, Yifan, et al.
Published: (2025)
Adversarial Attacks on Robotic Vision Language Action Models
by: Jones, Eliot Krzysztof, et al.
Published: (2025)
by: Jones, Eliot Krzysztof, et al.
Published: (2025)
RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models
by: Liufu, Weijia, et al.
Published: (2026)
by: Liufu, Weijia, et al.
Published: (2026)
World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems
by: Li, Runze, et al.
Published: (2026)
by: Li, Runze, et al.
Published: (2026)
RealMirror: A Comprehensive, Open-Source Vision-Language-Action Platform for Embodied AI
by: Tai, Cong, et al.
Published: (2025)
by: Tai, Cong, et al.
Published: (2025)
TriVLA: A Triple-System-Based Unified Vision-Language-Action Model with Episodic World Modeling for General Robot Control
by: Liu, Zhenyang, et al.
Published: (2025)
by: Liu, Zhenyang, et al.
Published: (2025)
Similar Items
-
SaiVLA-0: Cerebrum--Pons--Cerebellum Tripartite Architecture for Compute-Aware Vision-Language-Action
by: Shi, Xiang, et al.
Published: (2026) -
RDGen: Demonstration Generation for High-Quality Robot Learning via Reinforcement Learning
by: Zhu, Zijian, et al.
Published: (2026) -
Cross-Hand Latent Representation for Vision-Language-Action Models
by: Jiang, Guangqi, et al.
Published: (2026) -
Reading in the Dark with Foveated Event Vision
by: Brander, Carl, et al.
Published: (2025) -
Developing Vision-Language-Action Model from Egocentric Videos
by: Yoshida, Tomoya, et al.
Published: (2025)