:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yuan, Tianyuan, Liu, Yicheng, Lu, Chenhao, Chen, Zhuoguang, Jiang, Tao, Zhao, Hang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2510.13375
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Galaxea Open-World Dataset and G0 Dual-System VLA Model
by: Jiang, Tao, et al.
Published: (2025)

LONG3R: Long Sequence Streaming 3D Reconstruction
by: Chen, Zhuoguang, et al.
Published: (2025)

FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via Neural Action Tokenization
by: Liu, Yicheng, et al.
Published: (2025)

QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models
by: Li, Yixuan, et al.
Published: (2025)

Evo-Depth: A Lightweight Depth-Enhanced Vision-Language-Action Model
by: Lin, Tao, et al.
Published: (2026)

GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models
by: Sarowar, Md Selim, et al.
Published: (2026)

Fast-WAM: Do World Action Models Need Test-time Future Imagination?
by: Yuan, Tianyuan, et al.
Published: (2026)

AugVLA-3D: Depth-Driven Feature Augmentation for Vision-Language-Action Models
by: Rao, Zhifeng, et al.
Published: (2026)

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
by: Liu, Yang, et al.
Published: (2025)

VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
by: Ye, Angen, et al.
Published: (2025)

TrackOcc: Camera-based 4D Panoptic Occupancy Tracking
by: Chen, Zhuoguang, et al.
Published: (2025)

Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving
by: Zhang, Dapeng, et al.
Published: (2025)

Complet4R: Geometric Complete 4D Reconstruction
by: Wang, Weibang, et al.
Published: (2026)

UniLACT: Depth-Aware RGB Latent Action Learning for Vision-Language-Action Models
by: Govind, Manish Kumar, et al.
Published: (2026)

SLAM-Former: Putting SLAM into One Transformer
by: Yuan, Yijun, et al.
Published: (2025)

FD-VLA: Force-Distilled Vision-Language-Action Model for Contact-Rich Manipulation
by: Zhao, Ruiteng, et al.
Published: (2026)

DepthLM: Metric Depth From Vision Language Models
by: Cai, Zhipeng, et al.
Published: (2025)

$Δ$VLA: Prior-Guided Vision-Language-Action Models via World Knowledge Variation
by: Zhu, Yijie, et al.
Published: (2026)

PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors
by: Yuan, Tianyuan, et al.
Published: (2024)

GesVLA: Gesture-Aware Vision-Language-Action Model Embedded Representations
by: Guo, Wenxuan, et al.
Published: (2026)

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model
by: Sun, Jingwen, et al.
Published: (2026)

EaqVLA: Encoding-aligned Quantization for Vision-Language-Action Models
by: Jiang, Feng, et al.
Published: (2025)

Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models
by: Chi, Haohan, et al.
Published: (2025)

FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models
by: Wang, Xin, et al.
Published: (2025)

4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration
by: Zhang, Jiahui, et al.
Published: (2025)

LLaDA-VLA: Vision Language Diffusion Action Models
by: Wen, Yuqing, et al.
Published: (2025)

Vision-Language Embodiment for Monocular Depth Estimation
by: Zhang, Jinchang, et al.
Published: (2025)

QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
by: Ding, Pengxiang, et al.
Published: (2023)

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
by: Zhao, Qingqing, et al.
Published: (2025)

DEAR: Depth-Enhanced Action Recognition
by: Rahmaniboldaji, Sadegh, et al.
Published: (2024)

VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
by: Zhang, Jianke, et al.
Published: (2026)

SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning
by: Wang, Hanzhen, et al.
Published: (2025)

RotVLA: Rotational Latent Action for Vision-Language-Action Model
by: Li, Qiwei, et al.
Published: (2026)

SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model
by: Zhou, Zewei, et al.
Published: (2026)

QuoVLA: Quotient Space for Vision-Language-Action Models
by: Wang, Xuan, et al.
Published: (2026)

EvoVLA: Self-Evolving Vision-Language-Action Model
by: Liu, Zeting, et al.
Published: (2025)

SD-VLM: Spatial Measuring and Understanding with Depth-Encoded Vision-Language Models
by: Chen, Pingyi, et al.
Published: (2025)

DualCoT-VLA: Visual-Linguistic Chain of Thought via Parallel Reasoning for Vision-Language-Action Models
by: Zhong, Zhide, et al.
Published: (2026)

Perfecting Depth: Uncertainty-Aware Enhancement of Metric Depth
by: Jun, Jinyoung, et al.
Published: (2025)

PD-VLA: Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding
by: Song, Wenxuan, et al.
Published: (2025)