Saved in:
| Main Authors: | Chen, Feng, Wang, Xianghui, Chen, Yuxuan, Li, Boying, He, Yefei, Zhang, Zeyu, Wu, Yicheng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.11567 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification
by: He, Yefei, et al.
Published: (2024)
by: He, Yefei, et al.
Published: (2024)
From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models
by: Fang, Irving, et al.
Published: (2025)
by: Fang, Irving, et al.
Published: (2025)
LiveWorld: Simulating Out-of-Sight Dynamics in Generative Video World Models
by: Duan, Zicheng, et al.
Published: (2026)
by: Duan, Zicheng, et al.
Published: (2026)
EvoVLA: Self-Evolving Vision-Language-Action Model
by: Liu, Zeting, et al.
Published: (2025)
by: Liu, Zeting, et al.
Published: (2025)
Unified Vision-Language-Action Model
by: Wang, Yuqi, et al.
Published: (2025)
by: Wang, Yuqi, et al.
Published: (2025)
DepthVLA: Enhancing Vision-Language-Action Models with Depth-Aware Spatial Reasoning
by: Yuan, Tianyuan, et al.
Published: (2025)
by: Yuan, Tianyuan, et al.
Published: (2025)
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
by: Ye, Angen, et al.
Published: (2025)
by: Ye, Angen, et al.
Published: (2025)
Action Draft and Verify: A Self-Verifying Framework for Vision-Language-Action Model
by: Zhao, Chen, et al.
Published: (2026)
by: Zhao, Chen, et al.
Published: (2026)
FlashAR: Efficient Post-Training Acceleration for Autoregressive Image Generation
by: Zhou, Junkang, et al.
Published: (2026)
by: Zhou, Junkang, et al.
Published: (2026)
Less Detail, Better Answers: Degradation-Driven Prompting for VQA
by: Han, Haoxuan, et al.
Published: (2026)
by: Han, Haoxuan, et al.
Published: (2026)
Seeing Space and Motion: Enhancing Latent Actions with Geometric and Dynamic Awareness for Vision-Language-Action Models
by: Cai, Zhejia, et al.
Published: (2025)
by: Cai, Zhejia, et al.
Published: (2025)
GeneralVLA: Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning
by: Ma, Guoqing, et al.
Published: (2026)
by: Ma, Guoqing, et al.
Published: (2026)
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
by: Zhang, Jianke, et al.
Published: (2026)
by: Zhang, Jianke, et al.
Published: (2026)
WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation
by: Zhao, Baining, et al.
Published: (2026)
by: Zhao, Baining, et al.
Published: (2026)
UrbanVLA: A Vision-Language-Action Model for Urban Micromobility
by: Li, Anqi, et al.
Published: (2025)
by: Li, Anqi, et al.
Published: (2025)
PlanarGS: High-Fidelity Indoor 3D Gaussian Splatting Guided by Vision-Language Planar Priors
by: Jin, Xirui, et al.
Published: (2025)
by: Jin, Xirui, et al.
Published: (2025)
EaqVLA: Encoding-aligned Quantization for Vision-Language-Action Models
by: Jiang, Feng, et al.
Published: (2025)
by: Jiang, Feng, et al.
Published: (2025)
Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models
by: Luo, Yulin, et al.
Published: (2026)
by: Luo, Yulin, et al.
Published: (2026)
FD-VLA: Force-Distilled Vision-Language-Action Model for Contact-Rich Manipulation
by: Zhao, Ruiteng, et al.
Published: (2026)
by: Zhao, Ruiteng, et al.
Published: (2026)
Advancing Cross-domain Discriminability in Continual Learning of Vision-Language Models
by: Xu, Yicheng, et al.
Published: (2024)
by: Xu, Yicheng, et al.
Published: (2024)
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation
by: Xie, Haozhe, et al.
Published: (2026)
by: Xie, Haozhe, et al.
Published: (2026)
Neighboring Autoregressive Modeling for Efficient Visual Generation
by: He, Yefei, et al.
Published: (2025)
by: He, Yefei, et al.
Published: (2025)
FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models
by: Wang, Xin, et al.
Published: (2025)
by: Wang, Xin, et al.
Published: (2025)
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
by: Yang, Ganlin, et al.
Published: (2025)
by: Yang, Ganlin, et al.
Published: (2025)
MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots
by: Huang, Ting, et al.
Published: (2025)
by: Huang, Ting, et al.
Published: (2025)
Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models
by: He, Guangzhao, et al.
Published: (2026)
by: He, Guangzhao, et al.
Published: (2026)
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models
by: Gao, Chongkai, et al.
Published: (2025)
by: Gao, Chongkai, et al.
Published: (2025)
VLA-IAP: Training-Free Visual Token Pruning via Interaction Alignment for Vision-Language-Action Models
by: Cheng, Jintao, et al.
Published: (2026)
by: Cheng, Jintao, et al.
Published: (2026)
UAOR: Uncertainty-aware Observation Reinjection for Vision-Language-Action Models
by: Yang, Jiabing, et al.
Published: (2026)
by: Yang, Jiabing, et al.
Published: (2026)
EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models
by: He, Yefei, et al.
Published: (2023)
by: He, Yefei, et al.
Published: (2023)
OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models
by: Zhang, Yiwei, et al.
Published: (2026)
by: Zhang, Yiwei, et al.
Published: (2026)
A Self-Correcting Vision-Language-Action Model for Fast and Slow System Manipulation
by: Li, Chenxuan, et al.
Published: (2024)
by: Li, Chenxuan, et al.
Published: (2024)
QuoVLA: Quotient Space for Vision-Language-Action Models
by: Wang, Xuan, et al.
Published: (2026)
by: Wang, Xuan, et al.
Published: (2026)
GesVLA: Gesture-Aware Vision-Language-Action Model Embedded Representations
by: Guo, Wenxuan, et al.
Published: (2026)
by: Guo, Wenxuan, et al.
Published: (2026)
ActionPlan: Future-Aware Streaming Motion Synthesis via Frame-Level Action Planning
by: Nazarenus, Eric, et al.
Published: (2026)
by: Nazarenus, Eric, et al.
Published: (2026)
CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies
by: Du, Fan, et al.
Published: (2026)
by: Du, Fan, et al.
Published: (2026)
Where on Earth? A Vision-Language Benchmark for Probing Model Geolocation Skills Across Scales
by: Qian, Zhaofang, et al.
Published: (2025)
by: Qian, Zhaofang, et al.
Published: (2025)
RPD-Diff: Region-Adaptive Physics-Guided Diffusion Model for Visibility Enhancement under Dense and Non-Uniform Haze
by: Zhang, Ruicheng, et al.
Published: (2025)
by: Zhang, Ruicheng, et al.
Published: (2025)
FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via Neural Action Tokenization
by: Liu, Yicheng, et al.
Published: (2025)
by: Liu, Yicheng, et al.
Published: (2025)
ZipAR: Parallel Auto-regressive Image Generation through Spatial Locality
by: He, Yefei, et al.
Published: (2024)
by: He, Yefei, et al.
Published: (2024)
Similar Items
-
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification
by: He, Yefei, et al.
Published: (2024) -
From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models
by: Fang, Irving, et al.
Published: (2025) -
LiveWorld: Simulating Out-of-Sight Dynamics in Generative Video World Models
by: Duan, Zicheng, et al.
Published: (2026) -
EvoVLA: Self-Evolving Vision-Language-Action Model
by: Liu, Zeting, et al.
Published: (2025) -
Unified Vision-Language-Action Model
by: Wang, Yuqi, et al.
Published: (2025)