Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Rawal, Ishaan, Gupta, Shubh, Hu, Yihan, Zhan, Wei
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.21172
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914351803269120
author	Rawal, Ishaan Gupta, Shubh Hu, Yihan Zhan, Wei
author_facet	Rawal, Ishaan Gupta, Shubh Hu, Yihan Zhan, Wei
contents	Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we address both challenges with NORD (No Reasoning for Driving). Compared to existing VLAs, NORD achieves competitive performance while being fine-tuned on <60% of the data and no reasoning annotations, resulting in 3x fewer tokens. We identify that standard Group Relative Policy Optimization (GRPO) fails to yield significant improvements when applied to policies trained on such small, reasoning-free datasets. We show that this limitation stems from difficulty bias, which disproportionately penalizes reward signals from scenarios that produce high-variance rollouts within GRPO. NORD overcomes this by incorporating Dr. GRPO, a recent algorithm designed to mitigate difficulty bias in LLMs. As a result, NORD achieves competitive performance on Waymo and NAVSIM with a fraction of the training data and no reasoning overhead, enabling more efficient autonomous systems. Website: https://nord-vla-ai.github.io/
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_21172
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning Rawal, Ishaan Gupta, Shubh Hu, Yihan Zhan, Wei Artificial Intelligence Computer Vision and Pattern Recognition Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we address both challenges with NORD (No Reasoning for Driving). Compared to existing VLAs, NORD achieves competitive performance while being fine-tuned on <60% of the data and no reasoning annotations, resulting in 3x fewer tokens. We identify that standard Group Relative Policy Optimization (GRPO) fails to yield significant improvements when applied to policies trained on such small, reasoning-free datasets. We show that this limitation stems from difficulty bias, which disproportionately penalizes reward signals from scenarios that produce high-variance rollouts within GRPO. NORD overcomes this by incorporating Dr. GRPO, a recent algorithm designed to mitigate difficulty bias in LLMs. As a result, NORD achieves competitive performance on Waymo and NAVSIM with a fraction of the training data and no reasoning overhead, enabling more efficient autonomous systems. Website: https://nord-vla-ai.github.io/
title	NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning
topic	Artificial Intelligence Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.21172

Similar Items