Saved in:
| Main Authors: | Fang, Hengyu, Liu, Yijiang, Du, Yuan, Du, Li, Yang, Huanrui |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.09090 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning
by: Wang, Hanzhen, et al.
Published: (2025)
by: Wang, Hanzhen, et al.
Published: (2025)
CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies
by: Du, Fan, et al.
Published: (2026)
by: Du, Fan, et al.
Published: (2026)
QAPruner: Quantization-Aware Vision Token Pruning for Multimodal Large Language Models
by: Wang, Xinhao, et al.
Published: (2026)
by: Wang, Xinhao, et al.
Published: (2026)
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers
by: Wang, Yating, et al.
Published: (2025)
by: Wang, Yating, et al.
Published: (2025)
3D-VLA: A 3D Vision-Language-Action Generative World Model
by: Zhen, Haoyu, et al.
Published: (2024)
by: Zhen, Haoyu, et al.
Published: (2024)
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
by: Zhang, Jianke, et al.
Published: (2026)
by: Zhang, Jianke, et al.
Published: (2026)
Bridging the Semantic-Action Gap in Visual Token Pruning for Efficient VLA Inference
by: Liu, Ziyan, et al.
Published: (2025)
by: Liu, Ziyan, et al.
Published: (2025)
PAT: Pruning-Aware Tuning for Large Language Models
by: Liu, Yijiang, et al.
Published: (2024)
by: Liu, Yijiang, et al.
Published: (2024)
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models
by: Gao, Chongkai, et al.
Published: (2025)
by: Gao, Chongkai, et al.
Published: (2025)
OG-VLA: Orthographic Image Generation for 3D-Aware Vision-Language Action Model
by: Singh, Ishika, et al.
Published: (2025)
by: Singh, Ishika, et al.
Published: (2025)
VLA-Forget: Vision-Language-Action Unlearning for Embodied Foundation Models
by: Ranjan, Ravi, et al.
Published: (2026)
by: Ranjan, Ravi, et al.
Published: (2026)
LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models
by: Shen, Boyang, et al.
Published: (2026)
by: Shen, Boyang, et al.
Published: (2026)
ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations
by: Zhou, Yuhao, et al.
Published: (2026)
by: Zhou, Yuhao, et al.
Published: (2026)
StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems
by: Ye, Jinhui, et al.
Published: (2026)
by: Ye, Jinhui, et al.
Published: (2026)
DepthVLA: Enhancing Vision-Language-Action Models with Depth-Aware Spatial Reasoning
by: Yuan, Tianyuan, et al.
Published: (2025)
by: Yuan, Tianyuan, et al.
Published: (2025)
UrbanVLA: A Vision-Language-Action Model for Urban Micromobility
by: Li, Anqi, et al.
Published: (2025)
by: Li, Anqi, et al.
Published: (2025)
Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated Gradients
by: Xiang, Ziwei, et al.
Published: (2026)
by: Xiang, Ziwei, et al.
Published: (2026)
SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration
by: Li, Ye, et al.
Published: (2025)
by: Li, Ye, et al.
Published: (2025)
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
by: Yang, Ruihan, et al.
Published: (2025)
by: Yang, Ruihan, et al.
Published: (2025)
VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models
by: Si, Shengyu, et al.
Published: (2026)
by: Si, Shengyu, et al.
Published: (2026)
GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models
by: Sarowar, Md Selim, et al.
Published: (2026)
by: Sarowar, Md Selim, et al.
Published: (2026)
FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models
by: Wang, Xin, et al.
Published: (2025)
by: Wang, Xin, et al.
Published: (2025)
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games
by: Chen, Peng, et al.
Published: (2025)
by: Chen, Peng, et al.
Published: (2025)
EaqVLA: Encoding-aligned Quantization for Vision-Language-Action Models
by: Jiang, Feng, et al.
Published: (2025)
by: Jiang, Feng, et al.
Published: (2025)
ROCKET: Residual-Oriented Multi-Layer Alignment for Spatially-Aware Vision-Language-Action Models
by: Sun, Guoheng, et al.
Published: (2026)
by: Sun, Guoheng, et al.
Published: (2026)
VisualThink-VLA: Visual Intermediate Reasoning for Effective and Low-Latency Vision-Language-Action Policies
by: Gao, Mingjian, et al.
Published: (2026)
by: Gao, Mingjian, et al.
Published: (2026)
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
by: Zheng, Jinliang, et al.
Published: (2025)
by: Zheng, Jinliang, et al.
Published: (2025)
EvoVLA: Self-Evolving Vision-Language-Action Model
by: Liu, Zeting, et al.
Published: (2025)
by: Liu, Zeting, et al.
Published: (2025)
MAIN-VLA: Modeling Abstraction of Intention and eNvironment for Vision-Language-Action Models
by: Zhou, Zheyuan, et al.
Published: (2026)
by: Zhou, Zheyuan, et al.
Published: (2026)
AugVLA-3D: Depth-Driven Feature Augmentation for Vision-Language-Action Models
by: Rao, Zhifeng, et al.
Published: (2026)
by: Rao, Zhifeng, et al.
Published: (2026)
EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training
by: Du, Yiyang, et al.
Published: (2026)
by: Du, Yiyang, et al.
Published: (2026)
IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model
by: Jiang, Anqing, et al.
Published: (2025)
by: Jiang, Anqing, et al.
Published: (2025)
StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing
by: Community, StarVLA
Published: (2026)
by: Community, StarVLA
Published: (2026)
VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning
by: Wang, Chaoyang, et al.
Published: (2026)
by: Wang, Chaoyang, et al.
Published: (2026)
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
by: Chen, Xinyi, et al.
Published: (2025)
by: Chen, Xinyi, et al.
Published: (2025)
AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models
by: Li, Jiayu, et al.
Published: (2025)
by: Li, Jiayu, et al.
Published: (2025)
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
by: Zhao, Qingqing, et al.
Published: (2025)
by: Zhao, Qingqing, et al.
Published: (2025)
QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models
by: Li, Yixuan, et al.
Published: (2025)
by: Li, Yixuan, et al.
Published: (2025)
VLA-IAP: Training-Free Visual Token Pruning via Interaction Alignment for Vision-Language-Action Models
by: Cheng, Jintao, et al.
Published: (2026)
by: Cheng, Jintao, et al.
Published: (2026)
VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness
by: Zhang, Rongyu, et al.
Published: (2024)
by: Zhang, Rongyu, et al.
Published: (2024)
Similar Items
-
SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning
by: Wang, Hanzhen, et al.
Published: (2025) -
CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies
by: Du, Fan, et al.
Published: (2026) -
QAPruner: Quantization-Aware Vision Token Pruning for Multimodal Large Language Models
by: Wang, Xinhao, et al.
Published: (2026) -
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers
by: Wang, Yating, et al.
Published: (2025) -
3D-VLA: A 3D Vision-Language-Action Generative World Model
by: Zhen, Haoyu, et al.
Published: (2024)