Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Song, Yuehao, Chen, Shaoyu, Gao, Hao, Zhu, Yifan, Yue, Weixiang, Zou, Jialv, Jiang, Bo, Lu, Zihao, Wang, Yu, Zhang, Qian, Wang, Xinggang
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.11219
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918383734226944
author	Song, Yuehao Chen, Shaoyu Gao, Hao Zhu, Yifan Yue, Weixiang Zou, Jialv Jiang, Bo Lu, Zihao Wang, Yu Zhang, Qian Wang, Xinggang
author_facet	Song, Yuehao Chen, Shaoyu Gao, Hao Zhu, Yifan Yue, Weixiang Zou, Jialv Jiang, Bo Lu, Zihao Wang, Yu Zhang, Qian Wang, Xinggang
contents	Vision-language models (VLMs) enhance the planning capability of end-to-end (E2E) driving policy by leveraging high-level semantic reasoning. However, existing approaches often overlook the dual-system consistency between VLM's high-level decision and E2E's low-level planning. As a result, the generated trajectories may misalign with the intended driving decisions, leading to weakened top-down guidance and decision-following ability of the system. To address this issue, we propose Senna-2, an advanced VLM-E2E driving policy that explicitly aligns the two systems for consistent decision-making and planning. Our method follows a consistency-oriented three-stage training paradigm. In the first stage, we conduct driving pre-training to achieve preliminary decision-making and planning, with a decision adapter transmitting VLM decisions to E2E policy in the form of implicit embeddings. In the second stage, we align the VLM and the E2E policy in an open-loop setting. In the third stage, we perform closed-loop alignment via bottom-up Hierarchical Reinforcement Learning in 3DGS environments to reinforce the safety and efficiency. Extensive experiments demonstrate that Senna-2 achieves superior dual-system consistency (19.3% F1 score improvement) and significantly enhances driving safety in both open-loop (5.7% FDE reduction) and closed-loop settings (30.6% AF-CR reduction).
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_11219
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Senna-2: Aligning VLM and End-to-End Driving Policy for Consistent Decision Making and Planning Song, Yuehao Chen, Shaoyu Gao, Hao Zhu, Yifan Yue, Weixiang Zou, Jialv Jiang, Bo Lu, Zihao Wang, Yu Zhang, Qian Wang, Xinggang Computer Vision and Pattern Recognition Vision-language models (VLMs) enhance the planning capability of end-to-end (E2E) driving policy by leveraging high-level semantic reasoning. However, existing approaches often overlook the dual-system consistency between VLM's high-level decision and E2E's low-level planning. As a result, the generated trajectories may misalign with the intended driving decisions, leading to weakened top-down guidance and decision-following ability of the system. To address this issue, we propose Senna-2, an advanced VLM-E2E driving policy that explicitly aligns the two systems for consistent decision-making and planning. Our method follows a consistency-oriented three-stage training paradigm. In the first stage, we conduct driving pre-training to achieve preliminary decision-making and planning, with a decision adapter transmitting VLM decisions to E2E policy in the form of implicit embeddings. In the second stage, we align the VLM and the E2E policy in an open-loop setting. In the third stage, we perform closed-loop alignment via bottom-up Hierarchical Reinforcement Learning in 3DGS environments to reinforce the safety and efficiency. Extensive experiments demonstrate that Senna-2 achieves superior dual-system consistency (19.3% F1 score improvement) and significantly enhances driving safety in both open-loop (5.7% FDE reduction) and closed-loop settings (30.6% AF-CR reduction).
title	Senna-2: Aligning VLM and End-to-End Driving Policy for Consistent Decision Making and Planning
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2603.11219

Similar Items