Saved in:
| Main Authors: | Yuan, Tianyuan, Liu, Yicheng, Lu, Chenhao, Chen, Zhuoguang, Jiang, Tao, Zhao, Hang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.13375 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Galaxea Open-World Dataset and G0 Dual-System VLA Model
by: Jiang, Tao, et al.
Published: (2025)
by: Jiang, Tao, et al.
Published: (2025)
LONG3R: Long Sequence Streaming 3D Reconstruction
by: Chen, Zhuoguang, et al.
Published: (2025)
by: Chen, Zhuoguang, et al.
Published: (2025)
FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via Neural Action Tokenization
by: Liu, Yicheng, et al.
Published: (2025)
by: Liu, Yicheng, et al.
Published: (2025)
QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models
by: Li, Yixuan, et al.
Published: (2025)
by: Li, Yixuan, et al.
Published: (2025)
Evo-Depth: A Lightweight Depth-Enhanced Vision-Language-Action Model
by: Lin, Tao, et al.
Published: (2026)
by: Lin, Tao, et al.
Published: (2026)
GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models
by: Sarowar, Md Selim, et al.
Published: (2026)
by: Sarowar, Md Selim, et al.
Published: (2026)
Fast-WAM: Do World Action Models Need Test-time Future Imagination?
by: Yuan, Tianyuan, et al.
Published: (2026)
by: Yuan, Tianyuan, et al.
Published: (2026)
AugVLA-3D: Depth-Driven Feature Augmentation for Vision-Language-Action Models
by: Rao, Zhifeng, et al.
Published: (2026)
by: Rao, Zhifeng, et al.
Published: (2026)
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
by: Liu, Yang, et al.
Published: (2025)
by: Liu, Yang, et al.
Published: (2025)
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
by: Ye, Angen, et al.
Published: (2025)
by: Ye, Angen, et al.
Published: (2025)
TrackOcc: Camera-based 4D Panoptic Occupancy Tracking
by: Chen, Zhuoguang, et al.
Published: (2025)
by: Chen, Zhuoguang, et al.
Published: (2025)
Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving
by: Zhang, Dapeng, et al.
Published: (2025)
by: Zhang, Dapeng, et al.
Published: (2025)
Complet4R: Geometric Complete 4D Reconstruction
by: Wang, Weibang, et al.
Published: (2026)
by: Wang, Weibang, et al.
Published: (2026)
UniLACT: Depth-Aware RGB Latent Action Learning for Vision-Language-Action Models
by: Govind, Manish Kumar, et al.
Published: (2026)
by: Govind, Manish Kumar, et al.
Published: (2026)
SLAM-Former: Putting SLAM into One Transformer
by: Yuan, Yijun, et al.
Published: (2025)
by: Yuan, Yijun, et al.
Published: (2025)
FD-VLA: Force-Distilled Vision-Language-Action Model for Contact-Rich Manipulation
by: Zhao, Ruiteng, et al.
Published: (2026)
by: Zhao, Ruiteng, et al.
Published: (2026)
DepthLM: Metric Depth From Vision Language Models
by: Cai, Zhipeng, et al.
Published: (2025)
by: Cai, Zhipeng, et al.
Published: (2025)
$Δ$VLA: Prior-Guided Vision-Language-Action Models via World Knowledge Variation
by: Zhu, Yijie, et al.
Published: (2026)
by: Zhu, Yijie, et al.
Published: (2026)
PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors
by: Yuan, Tianyuan, et al.
Published: (2024)
by: Yuan, Tianyuan, et al.
Published: (2024)
GesVLA: Gesture-Aware Vision-Language-Action Model Embedded Representations
by: Guo, Wenxuan, et al.
Published: (2026)
by: Guo, Wenxuan, et al.
Published: (2026)
VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model
by: Sun, Jingwen, et al.
Published: (2026)
by: Sun, Jingwen, et al.
Published: (2026)
EaqVLA: Encoding-aligned Quantization for Vision-Language-Action Models
by: Jiang, Feng, et al.
Published: (2025)
by: Jiang, Feng, et al.
Published: (2025)
Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models
by: Chi, Haohan, et al.
Published: (2025)
by: Chi, Haohan, et al.
Published: (2025)
FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models
by: Wang, Xin, et al.
Published: (2025)
by: Wang, Xin, et al.
Published: (2025)
4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration
by: Zhang, Jiahui, et al.
Published: (2025)
by: Zhang, Jiahui, et al.
Published: (2025)
LLaDA-VLA: Vision Language Diffusion Action Models
by: Wen, Yuqing, et al.
Published: (2025)
by: Wen, Yuqing, et al.
Published: (2025)
Vision-Language Embodiment for Monocular Depth Estimation
by: Zhang, Jinchang, et al.
Published: (2025)
by: Zhang, Jinchang, et al.
Published: (2025)
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
by: Ding, Pengxiang, et al.
Published: (2023)
by: Ding, Pengxiang, et al.
Published: (2023)
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
by: Zhao, Qingqing, et al.
Published: (2025)
by: Zhao, Qingqing, et al.
Published: (2025)
DEAR: Depth-Enhanced Action Recognition
by: Rahmaniboldaji, Sadegh, et al.
Published: (2024)
by: Rahmaniboldaji, Sadegh, et al.
Published: (2024)
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
by: Zhang, Jianke, et al.
Published: (2026)
by: Zhang, Jianke, et al.
Published: (2026)
SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning
by: Wang, Hanzhen, et al.
Published: (2025)
by: Wang, Hanzhen, et al.
Published: (2025)
RotVLA: Rotational Latent Action for Vision-Language-Action Model
by: Li, Qiwei, et al.
Published: (2026)
by: Li, Qiwei, et al.
Published: (2026)
SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model
by: Zhou, Zewei, et al.
Published: (2026)
by: Zhou, Zewei, et al.
Published: (2026)
QuoVLA: Quotient Space for Vision-Language-Action Models
by: Wang, Xuan, et al.
Published: (2026)
by: Wang, Xuan, et al.
Published: (2026)
EvoVLA: Self-Evolving Vision-Language-Action Model
by: Liu, Zeting, et al.
Published: (2025)
by: Liu, Zeting, et al.
Published: (2025)
SD-VLM: Spatial Measuring and Understanding with Depth-Encoded Vision-Language Models
by: Chen, Pingyi, et al.
Published: (2025)
by: Chen, Pingyi, et al.
Published: (2025)
DualCoT-VLA: Visual-Linguistic Chain of Thought via Parallel Reasoning for Vision-Language-Action Models
by: Zhong, Zhide, et al.
Published: (2026)
by: Zhong, Zhide, et al.
Published: (2026)
Perfecting Depth: Uncertainty-Aware Enhancement of Metric Depth
by: Jun, Jinyoung, et al.
Published: (2025)
by: Jun, Jinyoung, et al.
Published: (2025)
PD-VLA: Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding
by: Song, Wenxuan, et al.
Published: (2025)
by: Song, Wenxuan, et al.
Published: (2025)
Similar Items
-
Galaxea Open-World Dataset and G0 Dual-System VLA Model
by: Jiang, Tao, et al.
Published: (2025) -
LONG3R: Long Sequence Streaming 3D Reconstruction
by: Chen, Zhuoguang, et al.
Published: (2025) -
FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via Neural Action Tokenization
by: Liu, Yicheng, et al.
Published: (2025) -
QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models
by: Li, Yixuan, et al.
Published: (2025) -
Evo-Depth: A Lightweight Depth-Enhanced Vision-Language-Action Model
by: Lin, Tao, et al.
Published: (2026)