Saved in:
| Main Authors: | Zhang, Ninghao, Zhu, Bin, Zhou, Shijie, Chen, Jingjing |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.06001 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Grounding Driving VLA via Inverse Kinematics
by: Park, Junsung, et al.
Published: (2026)
by: Park, Junsung, et al.
Published: (2026)
Self-Correcting VLA: Online Action Refinement via Sparse World Imagination
by: Liu, Chenyv, et al.
Published: (2026)
by: Liu, Chenyv, et al.
Published: (2026)
IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model
by: Jiang, Anqing, et al.
Published: (2025)
by: Jiang, Anqing, et al.
Published: (2025)
IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation
by: Lian, Shijie, et al.
Published: (2026)
by: Lian, Shijie, et al.
Published: (2026)
Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models
by: Zhang, Yanyan, et al.
Published: (2026)
by: Zhang, Yanyan, et al.
Published: (2026)
HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System
by: Yang, Tianshuo, et al.
Published: (2026)
by: Yang, Tianshuo, et al.
Published: (2026)
TrackVLA++: Unleashing Reasoning and Memory Capabilities in VLA Models for Embodied Visual Tracking
by: Liu, Jiahang, et al.
Published: (2025)
by: Liu, Jiahang, et al.
Published: (2025)
TTF-VLA: Temporal Token Fusion via Pixel-Attention Integration for Vision-Language-Action Models
by: Liu, Chenghao, et al.
Published: (2025)
by: Liu, Chenghao, et al.
Published: (2025)
VLANeXt: Recipes for Building Strong VLA Models
by: Wu, Xiao-Ming, et al.
Published: (2026)
by: Wu, Xiao-Ming, et al.
Published: (2026)
ChainFlow-VLA: Causal Flow Planning with Vision-Language Models
by: Wang, Xiyang, et al.
Published: (2026)
by: Wang, Xiyang, et al.
Published: (2026)
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
by: Zheng, Jinliang, et al.
Published: (2025)
by: Zheng, Jinliang, et al.
Published: (2025)
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
by: Chen, Xinyi, et al.
Published: (2025)
by: Chen, Xinyi, et al.
Published: (2025)
WholeBodyVLA: Towards Unified Latent VLA for Whole-Body Loco-Manipulation Control
by: Jiang, Haoran, et al.
Published: (2025)
by: Jiang, Haoran, et al.
Published: (2025)
ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge
by: Zhou, Zhongyi, et al.
Published: (2025)
by: Zhou, Zhongyi, et al.
Published: (2025)
UrbanVLA: A Vision-Language-Action Model for Urban Micromobility
by: Li, Anqi, et al.
Published: (2025)
by: Li, Anqi, et al.
Published: (2025)
AtomicVLA: Unlocking the Potential of Atomic Skill Learning in Robots
by: Zhang, Likui, et al.
Published: (2026)
by: Zhang, Likui, et al.
Published: (2026)
SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning
by: Wang, Hanzhen, et al.
Published: (2025)
by: Wang, Hanzhen, et al.
Published: (2025)
Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion Policy
by: Wu, Pengyuan, et al.
Published: (2026)
by: Wu, Pengyuan, et al.
Published: (2026)
DIAL: Decoupling Intent and Action via Latent World Modeling for End-to-End VLA
by: Chen, Yi, et al.
Published: (2026)
by: Chen, Yi, et al.
Published: (2026)
E0: Enhancing Generalization and Fine-Grained Control in VLA Models via Tweedie Discrete Diffusion
by: Zhan, Zhihao, et al.
Published: (2025)
by: Zhan, Zhihao, et al.
Published: (2025)
iFlyBot-VLA Technical Report
by: Zhang, Yuan, et al.
Published: (2025)
by: Zhang, Yuan, et al.
Published: (2025)
ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation
by: Chu, Zedong, et al.
Published: (2026)
by: Chu, Zedong, et al.
Published: (2026)
Spatial Traces: Enhancing VLA Models with Spatial-Temporal Understanding
by: Patratskiy, Maxim A., et al.
Published: (2025)
by: Patratskiy, Maxim A., et al.
Published: (2025)
AerialVLA: A Vision-Language-Action Model for UAV Navigation via Minimalist End-to-End Control
by: Xu, Peng, et al.
Published: (2026)
by: Xu, Peng, et al.
Published: (2026)
StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems
by: Ye, Jinhui, et al.
Published: (2026)
by: Ye, Jinhui, et al.
Published: (2026)
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models
by: Gao, Chongkai, et al.
Published: (2025)
by: Gao, Chongkai, et al.
Published: (2025)
WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving
by: Xu, Mingwang, et al.
Published: (2025)
by: Xu, Mingwang, et al.
Published: (2025)
LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models
by: Shen, Boyang, et al.
Published: (2026)
by: Shen, Boyang, et al.
Published: (2026)
Is VLA Reasoning Faithful? Probing Safety of Chain-of-Causation in Autonomous Driving Models
by: Mayumu, Nicanor, et al.
Published: (2026)
by: Mayumu, Nicanor, et al.
Published: (2026)
VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models
by: Si, Shengyu, et al.
Published: (2026)
by: Si, Shengyu, et al.
Published: (2026)
StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing
by: Community, StarVLA
Published: (2026)
by: Community, StarVLA
Published: (2026)
VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning
by: Wang, Chaoyang, et al.
Published: (2026)
by: Wang, Chaoyang, et al.
Published: (2026)
IntentionVLA: Generalizable and Efficient Embodied Intention Reasoning for Human-Robot Interaction
by: Chen, Yandu, et al.
Published: (2025)
by: Chen, Yandu, et al.
Published: (2025)
DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning
by: Zhou, Yang, et al.
Published: (2026)
by: Zhou, Yang, et al.
Published: (2026)
Gradient-Guided Parameter Mask for Multi-Scenario Image Restoration Under Adverse Weather
by: Guo, Jilong, et al.
Published: (2024)
by: Guo, Jilong, et al.
Published: (2024)
DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving
by: Jiang, Anqing, et al.
Published: (2025)
by: Jiang, Anqing, et al.
Published: (2025)
Ego-Grounding for Personalized Question-Answering in Egocentric Videos
by: Xiao, Junbin, et al.
Published: (2026)
by: Xiao, Junbin, et al.
Published: (2026)
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration
by: Ni, Chaojun, et al.
Published: (2024)
by: Ni, Chaojun, et al.
Published: (2024)
OG-VLA: Orthographic Image Generation for 3D-Aware Vision-Language Action Model
by: Singh, Ishika, et al.
Published: (2025)
by: Singh, Ishika, et al.
Published: (2025)
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
by: Shen, Yichao, et al.
Published: (2025)
by: Shen, Yichao, et al.
Published: (2025)
Similar Items
-
Grounding Driving VLA via Inverse Kinematics
by: Park, Junsung, et al.
Published: (2026) -
Self-Correcting VLA: Online Action Refinement via Sparse World Imagination
by: Liu, Chenyv, et al.
Published: (2026) -
IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model
by: Jiang, Anqing, et al.
Published: (2025) -
IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation
by: Lian, Shijie, et al.
Published: (2026) -
Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models
by: Zhang, Yanyan, et al.
Published: (2026)