Saved in:
| Main Authors: | Budzianowski, Paweł, Maa, Wesley, Freed, Matthew, Mo, Jingxiang, Hsiao, Winston, Xie, Aaron, Młoduchowski, Tomasz, Tipnis, Viraj, Bolte, Benjamin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.14049 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments
by: Wu, Yuze, et al.
Published: (2025)
by: Wu, Yuze, et al.
Published: (2025)
HyperVLA: Efficient Inference in Vision-Language-Action Models via Hypernetworks
by: Xiong, Zheng, et al.
Published: (2025)
by: Xiong, Zheng, et al.
Published: (2025)
Lite VLA: Efficient Vision-Language-Action Control on CPU-Bound Edge Robots
by: Williams, Justin, et al.
Published: (2025)
by: Williams, Justin, et al.
Published: (2025)
PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models
by: Guo, Xinyu, et al.
Published: (2026)
by: Guo, Xinyu, et al.
Published: (2026)
ElegantVLA: Learning When to Think for Efficient Vision-Language-Action Models
by: Li, Ye, et al.
Published: (2026)
by: Li, Ye, et al.
Published: (2026)
RLinf-VLA: A Unified and Efficient Framework for Reinforcement Learning of Vision-Language-Action Models
by: Zang, Hongzhi, et al.
Published: (2025)
by: Zang, Hongzhi, et al.
Published: (2025)
GeoVLA: Empowering 3D Representations in Vision-Language-Action Models
by: Sun, Lin, et al.
Published: (2025)
by: Sun, Lin, et al.
Published: (2025)
ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models
by: Zhong, Linqing, et al.
Published: (2026)
by: Zhong, Linqing, et al.
Published: (2026)
StereoVLA: Enhancing Vision-Language-Action Models with Stereo Vision
by: Deng, Shengliang, et al.
Published: (2025)
by: Deng, Shengliang, et al.
Published: (2025)
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
by: Shukor, Mustafa, et al.
Published: (2025)
by: Shukor, Mustafa, et al.
Published: (2025)
Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models
by: Jin, Ruofan, et al.
Published: (2026)
by: Jin, Ruofan, et al.
Published: (2026)
DySL-VLA: Efficient Vision-Language-Action Model Inference via Dynamic-Static Layer-Skipping for Robot Manipulation
by: Yang, Zebin, et al.
Published: (2026)
by: Yang, Zebin, et al.
Published: (2026)
OpenVLA: An Open-Source Vision-Language-Action Model
by: Kim, Moo Jin, et al.
Published: (2024)
by: Kim, Moo Jin, et al.
Published: (2024)
STRONG-VLA: Decoupled Robustness Learning for Vision-Language-Action Models under Multimodal Perturbations
by: Xie, Yuhan, et al.
Published: (2026)
by: Xie, Yuhan, et al.
Published: (2026)
SG-VLA: Learning Spatially-Grounded Vision-Language-Action Models for Mobile Manipulation
by: Tu, Ruisen, et al.
Published: (2026)
by: Tu, Ruisen, et al.
Published: (2026)
AIR-VLA: Vision-Language-Action Systems for Aerial Manipulation
by: Sun, Jianli, et al.
Published: (2026)
by: Sun, Jianli, et al.
Published: (2026)
OpenGVL -- Benchmarking Visual Temporal Progress for Data Curation
by: Budzianowski, Paweł, et al.
Published: (2025)
by: Budzianowski, Paweł, et al.
Published: (2025)
Pheme: Efficient and Conversational Speech Generation
by: Budzianowski, Paweł, et al.
Published: (2024)
by: Budzianowski, Paweł, et al.
Published: (2024)
STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models
by: Xu, Feng, et al.
Published: (2025)
by: Xu, Feng, et al.
Published: (2025)
StreamingVLA: Streaming Vision-Language-Action Model with Action Flow Matching and Adaptive Early Observation
by: Shi, Yiran, et al.
Published: (2026)
by: Shi, Yiran, et al.
Published: (2026)
AC^2-VLA: Action-Context-Aware Adaptive Computation in Vision-Language-Action Models for Efficient Robotic Manipulation
by: Yu, Wenda, et al.
Published: (2026)
by: Yu, Wenda, et al.
Published: (2026)
AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models
by: Hu, Yutong, et al.
Published: (2026)
by: Hu, Yutong, et al.
Published: (2026)
FocusVLA: Focused Visual Utilization for Vision-Language-Action Models
by: Zhang, Yichi, et al.
Published: (2026)
by: Zhang, Yichi, et al.
Published: (2026)
VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models
by: Wang, Zixuan, et al.
Published: (2026)
by: Wang, Zixuan, et al.
Published: (2026)
RedVLA: Physical Red Teaming for Vision-Language-Action Models
by: Zhang, Yuhao, et al.
Published: (2026)
by: Zhang, Yuhao, et al.
Published: (2026)
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots
by: Apanasevich, I., et al.
Published: (2026)
by: Apanasevich, I., et al.
Published: (2026)
NS-VLA: Towards Neuro-Symbolic Vision-Language-Action Models
by: Zhu, Ziyue, et al.
Published: (2026)
by: Zhu, Ziyue, et al.
Published: (2026)
FutureVLA: Joint Visuomotor Prediction for Vision-Language-Action Model
by: Xu, Xiaoxu, et al.
Published: (2026)
by: Xu, Xiaoxu, et al.
Published: (2026)
Audio-VLA: Adding Contact Audio Perception to Vision-Language-Action Model for Robotic Manipulation
by: Wei, Xiangyi, et al.
Published: (2025)
by: Wei, Xiangyi, et al.
Published: (2025)
FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction
by: Yang, Yifan, et al.
Published: (2025)
by: Yang, Yifan, et al.
Published: (2025)
X-DiffVLA: X-Embodied Diffusion Action Heads for Vision-Language-Action Models
by: Li, Boyu, et al.
Published: (2026)
by: Li, Boyu, et al.
Published: (2026)
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation
by: Xie, Haozhe, et al.
Published: (2026)
by: Xie, Haozhe, et al.
Published: (2026)
RotVLA: Rotational Latent Action for Vision-Language-Action Model
by: Li, Qiwei, et al.
Published: (2026)
by: Li, Qiwei, et al.
Published: (2026)
CRL-VLA: Continual Vision-Language-Action Learning
by: Zeng, Qixin, et al.
Published: (2026)
by: Zeng, Qixin, et al.
Published: (2026)
SmoothVLA: Aligning Vision-Language-Action Models with Physical Constraints via Intrinsic Smoothness Optimization
by: Li, Jiashun, et al.
Published: (2026)
by: Li, Jiashun, et al.
Published: (2026)
GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model
by: Abouzeid, Ali, et al.
Published: (2025)
by: Abouzeid, Ali, et al.
Published: (2025)
RationalVLA: A Rational Vision-Language-Action Model with Dual System
by: Song, Wenxuan, et al.
Published: (2025)
by: Song, Wenxuan, et al.
Published: (2025)
Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models
by: Bai, Shuanghao, et al.
Published: (2026)
by: Bai, Shuanghao, et al.
Published: (2026)
RoVLA: Multi-Consistency Constraints for Robust Vision-Language-Action Models
by: Luo, Jingzhou, et al.
Published: (2026)
by: Luo, Jingzhou, et al.
Published: (2026)
PAPO-VLA: Planning-Aware Policy Optimization for Vision-Language-Action Models
by: Guo, Peizheng, et al.
Published: (2026)
by: Guo, Peizheng, et al.
Published: (2026)
Similar Items
-
VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments
by: Wu, Yuze, et al.
Published: (2025) -
HyperVLA: Efficient Inference in Vision-Language-Action Models via Hypernetworks
by: Xiong, Zheng, et al.
Published: (2025) -
Lite VLA: Efficient Vision-Language-Action Control on CPU-Bound Edge Robots
by: Williams, Justin, et al.
Published: (2025) -
PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models
by: Guo, Xinyu, et al.
Published: (2026) -
ElegantVLA: Learning When to Think for Efficient Vision-Language-Action Models
by: Li, Ye, et al.
Published: (2026)