Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Zihang, Li, Xu, Wang, Benwu, Zhu, Wenkai, Chen, Xieyuanli, Kong, Dong, Lyu, Kailin, Du, Yinan, Peng, Yiming, Che, Haoyang
Format:	Preprint
Published:	2026
Subjects:	Robotics Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.00694
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914361294979072
author	Wang, Zihang Li, Xu Wang, Benwu Zhu, Wenkai Chen, Xieyuanli Kong, Dong Lyu, Kailin Du, Yinan Peng, Yiming Che, Haoyang
author_facet	Wang, Zihang Li, Xu Wang, Benwu Zhu, Wenkai Chen, Xieyuanli Kong, Dong Lyu, Kailin Du, Yinan Peng, Yiming Che, Haoyang
contents	Explainability and transparent decision-making are essential for the safe deployment of autonomous driving systems. Scene captioning summarizes environmental conditions and risk factors in natural language, improving transparency, safety, and human--robot interaction. However, most existing approaches target structured urban scenarios; in off-road environments, they are vulnerable to single-modality degradations caused by rain, fog, snow, and darkness, and they lack a unified framework that jointly models structured scene captioning and path planning. To bridge this gap, we propose Wild-Drive, an efficient framework for off-road scene captioning and path planning. Wild-Drive adopts modern multimodal encoders and introduces a task-conditioned modality-routing bridge, MoRo-Former, to adaptively aggregate reliable information under degraded sensing. It then integrates an efficient large language model (LLM), together with a planning token and a gate recurrent unit (GRU) decoder, to generate structured captions and predict future trajectories. We also build the OR-C2P Benchmark, which covers structured off-road scene captioning and path planning under diverse sensor corruption conditions. Experiments on OR-C2P dataset and a self-collected dataset show that Wild-Drive outperforms prior LLM-based methods and remains more stable under degraded sensing. The code and benchmark will be publicly available at https://github.com/wangzihanggg/Wild-Drive.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_00694
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Wild-Drive: Off-Road Scene Captioning and Path Planning via Robust Multi-modal Routing and Efficient Large Language Model Wang, Zihang Li, Xu Wang, Benwu Zhu, Wenkai Chen, Xieyuanli Kong, Dong Lyu, Kailin Du, Yinan Peng, Yiming Che, Haoyang Robotics Artificial Intelligence Explainability and transparent decision-making are essential for the safe deployment of autonomous driving systems. Scene captioning summarizes environmental conditions and risk factors in natural language, improving transparency, safety, and human--robot interaction. However, most existing approaches target structured urban scenarios; in off-road environments, they are vulnerable to single-modality degradations caused by rain, fog, snow, and darkness, and they lack a unified framework that jointly models structured scene captioning and path planning. To bridge this gap, we propose Wild-Drive, an efficient framework for off-road scene captioning and path planning. Wild-Drive adopts modern multimodal encoders and introduces a task-conditioned modality-routing bridge, MoRo-Former, to adaptively aggregate reliable information under degraded sensing. It then integrates an efficient large language model (LLM), together with a planning token and a gate recurrent unit (GRU) decoder, to generate structured captions and predict future trajectories. We also build the OR-C2P Benchmark, which covers structured off-road scene captioning and path planning under diverse sensor corruption conditions. Experiments on OR-C2P dataset and a self-collected dataset show that Wild-Drive outperforms prior LLM-based methods and remains more stable under degraded sensing. The code and benchmark will be publicly available at https://github.com/wangzihanggg/Wild-Drive.
title	Wild-Drive: Off-Road Scene Captioning and Path Planning via Robust Multi-modal Routing and Efficient Large Language Model
topic	Robotics Artificial Intelligence
url	https://arxiv.org/abs/2603.00694

Similar Items