Saved in:
| Main Authors: | Rawal, Ishaan, Gupta, Shubh, Hu, Yihan, Zhan, Wei |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.21172 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Interactive Video Generation via Domain Adaptation
by: Rawal, Ishaan, et al.
Published: (2025)
by: Rawal, Ishaan, et al.
Published: (2025)
Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion
by: Rawal, Ishaan Singh, et al.
Published: (2023)
by: Rawal, Ishaan Singh, et al.
Published: (2023)
VLADriver-RAG: Retrieval-Augmented Vision-Language-Action Models for Autonomous Driving
by: Zhao, Rui, et al.
Published: (2026)
by: Zhao, Rui, et al.
Published: (2026)
Learning Vision-Language-Action World Models for Autonomous Driving
by: Wang, Guoqing, et al.
Published: (2026)
by: Wang, Guoqing, et al.
Published: (2026)
A Survey on Vision-Language-Action Models for Autonomous Driving
by: Jiang, Sicong, et al.
Published: (2025)
by: Jiang, Sicong, et al.
Published: (2025)
Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight
by: Yang, Yi, et al.
Published: (2025)
by: Yang, Yi, et al.
Published: (2025)
MedSAGa: Few-shot Memory Efficient Medical Image Segmentation using Gradient Low-Rank Projection in SAM
by: Mahla, Navyansh, et al.
Published: (2024)
by: Mahla, Navyansh, et al.
Published: (2024)
CARE Drive A Framework for Evaluating Reason-Responsiveness of Vision Language Models in Automated Driving
by: Suryana, Lucas Elbert, et al.
Published: (2026)
by: Suryana, Lucas Elbert, et al.
Published: (2026)
ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations
by: Zhou, Yuhao, et al.
Published: (2026)
by: Zhou, Yuhao, et al.
Published: (2026)
LVDrive: Latent Visual Representation Enhanced Vision-Language-Action Autonomous Driving Model
by: Mei, Xiaodong, et al.
Published: (2026)
by: Mei, Xiaodong, et al.
Published: (2026)
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
by: Zhang, Congzhi, et al.
Published: (2025)
by: Zhang, Congzhi, et al.
Published: (2025)
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
by: Zhang, Jianke, et al.
Published: (2026)
by: Zhang, Jianke, et al.
Published: (2026)
ATA: Bridging Implicit Reasoning with Attention-Guided and Action-Guided Inference for Vision-Language Action Models
by: Yang, Cheng, et al.
Published: (2026)
by: Yang, Cheng, et al.
Published: (2026)
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving
by: Yang, Zhenjie, et al.
Published: (2025)
by: Yang, Zhenjie, et al.
Published: (2025)
RAD: Retrieval-Augmented Decision-Making of Meta-Actions with Vision-Language Models in Autonomous Driving
by: Wang, Yujin, et al.
Published: (2025)
by: Wang, Yujin, et al.
Published: (2025)
E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving
by: Tang, Yihong, et al.
Published: (2025)
by: Tang, Yihong, et al.
Published: (2025)
Less is More: Lean yet Powerful Vision-Language Model for Autonomous Driving
by: Yang, Sheng, et al.
Published: (2025)
by: Yang, Sheng, et al.
Published: (2025)
HMVLM: Multistage Reasoning-Enhanced Vision-Language Model for Long-Tailed Driving Scenarios
by: Wang, Daming, et al.
Published: (2025)
by: Wang, Daming, et al.
Published: (2025)
Prompting Large Vision-Language Models for Compositional Reasoning
by: Ossowski, Timothy, et al.
Published: (2024)
by: Ossowski, Timothy, et al.
Published: (2024)
Does Visual Information Play a Decisive Role in Vision-Language-Action Model Driving Behavior?
by: He, Jingtao, et al.
Published: (2026)
by: He, Jingtao, et al.
Published: (2026)
DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models
by: Hao, Yuhan, et al.
Published: (2025)
by: Hao, Yuhan, et al.
Published: (2025)
Vision Language Models in Autonomous Driving: A Survey and Outlook
by: Zhou, Xingcheng, et al.
Published: (2023)
by: Zhou, Xingcheng, et al.
Published: (2023)
Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning
by: Wu, Aodi, et al.
Published: (2025)
by: Wu, Aodi, et al.
Published: (2025)
CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies
by: Du, Fan, et al.
Published: (2026)
by: Du, Fan, et al.
Published: (2026)
Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving
by: Gopalkrishnan, Akshay, et al.
Published: (2024)
by: Gopalkrishnan, Akshay, et al.
Published: (2024)
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving
by: Zheng, Peiru, et al.
Published: (2024)
by: Zheng, Peiru, et al.
Published: (2024)
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
by: Pandya, Pranshu, et al.
Published: (2024)
by: Pandya, Pranshu, et al.
Published: (2024)
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games
by: Chen, Peng, et al.
Published: (2025)
by: Chen, Peng, et al.
Published: (2025)
DeeAD: Dynamic Early Exit of Vision-Language Action for Efficient Autonomous Driving
by: HU, Haibo, et al.
Published: (2025)
by: HU, Haibo, et al.
Published: (2025)
VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning
by: Wang, Chaoyang, et al.
Published: (2026)
by: Wang, Chaoyang, et al.
Published: (2026)
DVGT-2: Vision-Geometry-Action Model for Autonomous Driving at Scale
by: Zuo, Sicheng, et al.
Published: (2026)
by: Zuo, Sicheng, et al.
Published: (2026)
Offline Semantic Guidance for Efficient Vision-Language-Action Policy Distillation
by: Shi, Jin, et al.
Published: (2026)
by: Shi, Jin, et al.
Published: (2026)
Survey on Vision-Language-Action Models
by: Adilkhanov, Adilzhan, et al.
Published: (2025)
by: Adilkhanov, Adilzhan, et al.
Published: (2025)
Agentic Surgical AI: Surgeon Style Fingerprinting and Privacy Risk Quantification via Discrete Diffusion in a Vision-Language-Action Framework
by: Zhan, Huixin, et al.
Published: (2025)
by: Zhan, Huixin, et al.
Published: (2025)
NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving
by: Tian, Kexin, et al.
Published: (2025)
by: Tian, Kexin, et al.
Published: (2025)
Structured Labeling Enables Faster Vision-Language Models for End-to-End Autonomous Driving
by: Jiang, Hao, et al.
Published: (2025)
by: Jiang, Hao, et al.
Published: (2025)
AD-EE: Early Exiting for Fast and Reliable Vision-Language Models in Autonomous Driving
by: Huang, Lianming, et al.
Published: (2025)
by: Huang, Lianming, et al.
Published: (2025)
SpaRE: Enhancing Spatial Reasoning in Vision-Language Models with Synthetic Data
by: Ogezi, Michael, et al.
Published: (2025)
by: Ogezi, Michael, et al.
Published: (2025)
A Survey on Efficient Vision-Language-Action Models
by: Yu, Zhaoshu, et al.
Published: (2025)
by: Yu, Zhaoshu, et al.
Published: (2025)
Vision Language Model-Empowered Contract Theory for AIGC Task Allocation in Teleoperation
by: Zhan, Zijun, et al.
Published: (2024)
by: Zhan, Zijun, et al.
Published: (2024)
Similar Items
-
Interactive Video Generation via Domain Adaptation
by: Rawal, Ishaan, et al.
Published: (2025) -
Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion
by: Rawal, Ishaan Singh, et al.
Published: (2023) -
VLADriver-RAG: Retrieval-Augmented Vision-Language-Action Models for Autonomous Driving
by: Zhao, Rui, et al.
Published: (2026) -
Learning Vision-Language-Action World Models for Autonomous Driving
by: Wang, Guoqing, et al.
Published: (2026) -
A Survey on Vision-Language-Action Models for Autonomous Driving
by: Jiang, Sicong, et al.
Published: (2025)