Saved in:
| Main Authors: | Hu, Xintong, Huang, Xuhong, Zhang, Jinyu, Yao, Yutong, Sun, Yuchong, Wang, Qiuyue, Li, Mingsheng, Xie, Sicheng, Liu, Yitao, Chen, Junhao, Chen, Yixuan, Zheng, Yingming, Bai, Shuai, Yu, Tao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.27284 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments
by: Wang, Qiuyue, et al.
Published: (2026)
by: Wang, Qiuyue, et al.
Published: (2026)
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
by: Zhang, Jianke, et al.
Published: (2026)
by: Zhang, Jianke, et al.
Published: (2026)
CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies
by: Du, Fan, et al.
Published: (2026)
by: Du, Fan, et al.
Published: (2026)
Generating Vision-Language Navigation Instructions Incorporated Fine-Grained Alignment Annotations
by: Cui, Yibo, et al.
Published: (2025)
by: Cui, Yibo, et al.
Published: (2025)
Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control
by: Chen, William, et al.
Published: (2026)
by: Chen, William, et al.
Published: (2026)
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
by: Yang, Shuai, et al.
Published: (2025)
by: Yang, Shuai, et al.
Published: (2025)
Unify Robot Actions in Camera Frame
by: Xie, Sicheng, et al.
Published: (2025)
by: Xie, Sicheng, et al.
Published: (2025)
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
by: Chen, Yuhui, et al.
Published: (2025)
by: Chen, Yuhui, et al.
Published: (2025)
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
by: Peng, Wujian, et al.
Published: (2023)
by: Peng, Wujian, et al.
Published: (2023)
FT-NCFM: An Influence-Aware Data Distillation Framework for Efficient VLA Models
by: Chen, Kewei, et al.
Published: (2025)
by: Chen, Kewei, et al.
Published: (2025)
HieroAction: Hierarchically Guided VLM for Fine-Grained Action Analysis
by: Wu, Junhao, et al.
Published: (2025)
by: Wu, Junhao, et al.
Published: (2025)
ATAAT: Adaptive Threat-Aware Adversarial Tuning Framework against Backdoor Attacks on Vision-Language-Action Models
by: Chen, Kewei, et al.
Published: (2026)
by: Chen, Kewei, et al.
Published: (2026)
Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation
by: Chen, Wenting, et al.
Published: (2023)
by: Chen, Wenting, et al.
Published: (2023)
Medical Image Synthesis via Fine-Grained Image-Text Alignment and Anatomy-Pathology Prompting
by: Chen, Wenting, et al.
Published: (2024)
by: Chen, Wenting, et al.
Published: (2024)
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators
by: Li, Hengtao, et al.
Published: (2025)
by: Li, Hengtao, et al.
Published: (2025)
STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models
by: Xu, Feng, et al.
Published: (2025)
by: Xu, Feng, et al.
Published: (2025)
Towards Fine-Grained Vision-Language Alignment for Few-Shot Anomaly Detection
by: Fan, Yuanting, et al.
Published: (2025)
by: Fan, Yuanting, et al.
Published: (2025)
Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models
by: Lyu, Mingyang, et al.
Published: (2025)
by: Lyu, Mingyang, et al.
Published: (2025)
QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models
by: Li, Yixuan, et al.
Published: (2025)
by: Li, Yixuan, et al.
Published: (2025)
Enhancing Linguistic Generalization of VLA: Fine-Tuning OpenVLA via Synthetic Instruction Augmentation
by: Shin, Dongik
Published: (2026)
by: Shin, Dongik
Published: (2026)
SmoothVLA: Aligning Vision-Language-Action Models with Physical Constraints via Intrinsic Smoothness Optimization
by: Li, Jiashun, et al.
Published: (2026)
by: Li, Jiashun, et al.
Published: (2026)
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
by: Cui, Chenhang, et al.
Published: (2024)
by: Cui, Chenhang, et al.
Published: (2024)
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
by: Chen, Xinyi, et al.
Published: (2025)
by: Chen, Xinyi, et al.
Published: (2025)
Fine-Grained Action Segmentation for Renorrhaphy in Robot-Assisted Partial Nephrectomy
by: Dai, Jiaheng, et al.
Published: (2026)
by: Dai, Jiaheng, et al.
Published: (2026)
Fine-Grained Instruction-Guided Graph Reasoning for Vision-and-Language Navigation
by: Liu, Yaohua, et al.
Published: (2025)
by: Liu, Yaohua, et al.
Published: (2025)
DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping
by: Zhong, Yifan, et al.
Published: (2025)
by: Zhong, Yifan, et al.
Published: (2025)
Decomposed Vision-Language Alignment for Fine-Grained Open-Vocabulary Segmentation
by: Wang, Chenhao, et al.
Published: (2026)
by: Wang, Chenhao, et al.
Published: (2026)
Fine-Grained Alignment in Vision-and-Language Navigation through Bayesian Optimization
by: Song, Yuhang, et al.
Published: (2024)
by: Song, Yuhang, et al.
Published: (2024)
ITSELF: Attention Guided Fine-Grained Alignment for Vision-Language Retrieval
by: Nguyen, Tien-Huy, et al.
Published: (2026)
by: Nguyen, Tien-Huy, et al.
Published: (2026)
DyQ-VLA: Temporal-Dynamic-Aware Quantization for Embodied Vision-Language-Action Models
by: Zheng, Zihao, et al.
Published: (2026)
by: Zheng, Zihao, et al.
Published: (2026)
EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models
by: Bai, Zechen, et al.
Published: (2025)
by: Bai, Zechen, et al.
Published: (2025)
ST4VLA: Spatially Guided Training for Vision-Language-Action Models
by: Ye, Jinhui, et al.
Published: (2026)
by: Ye, Jinhui, et al.
Published: (2026)
AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models
by: Hu, Yutong, et al.
Published: (2026)
by: Hu, Yutong, et al.
Published: (2026)
CoFInAl: Enhancing Action Quality Assessment with Coarse-to-Fine Instruction Alignment
by: Zhou, Kanglei, et al.
Published: (2024)
by: Zhou, Kanglei, et al.
Published: (2024)
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
by: Niu, Dantong, et al.
Published: (2024)
by: Niu, Dantong, et al.
Published: (2024)
Seeing as Experts Do: A Knowledge-Augmented Agent for Open-Set Fine-Grained Visual Understanding
by: Chen, Junhan, et al.
Published: (2026)
by: Chen, Junhan, et al.
Published: (2026)
VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning
by: Wang, Chaoyang, et al.
Published: (2026)
by: Wang, Chaoyang, et al.
Published: (2026)
From Coarse to Fine: Benchmarking and Reward Modeling for Writing-Centric Generation Tasks
by: Ren, Qingyu, et al.
Published: (2026)
by: Ren, Qingyu, et al.
Published: (2026)
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
by: Zhang, Borong, et al.
Published: (2025)
by: Zhang, Borong, et al.
Published: (2025)
Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion
by: Li, Zhuo, et al.
Published: (2025)
by: Li, Zhuo, et al.
Published: (2025)
Similar Items
-
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments
by: Wang, Qiuyue, et al.
Published: (2026) -
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
by: Zhang, Jianke, et al.
Published: (2026) -
CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies
by: Du, Fan, et al.
Published: (2026) -
Generating Vision-Language Navigation Instructions Incorporated Fine-Grained Alignment Annotations
by: Cui, Yibo, et al.
Published: (2025) -
Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control
by: Chen, William, et al.
Published: (2026)