Saved in:
Bibliographic Details
Main Authors: Wang, Zihang, Li, Xu, Wang, Benwu, Zhu, Wenkai, Chen, Xieyuanli, Kong, Dong, Lyu, Kailin, Du, Yinan, Peng, Yiming, Che, Haoyang
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.00694
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914361294979072
author Wang, Zihang
Li, Xu
Wang, Benwu
Zhu, Wenkai
Chen, Xieyuanli
Kong, Dong
Lyu, Kailin
Du, Yinan
Peng, Yiming
Che, Haoyang
author_facet Wang, Zihang
Li, Xu
Wang, Benwu
Zhu, Wenkai
Chen, Xieyuanli
Kong, Dong
Lyu, Kailin
Du, Yinan
Peng, Yiming
Che, Haoyang
contents Explainability and transparent decision-making are essential for the safe deployment of autonomous driving systems. Scene captioning summarizes environmental conditions and risk factors in natural language, improving transparency, safety, and human--robot interaction. However, most existing approaches target structured urban scenarios; in off-road environments, they are vulnerable to single-modality degradations caused by rain, fog, snow, and darkness, and they lack a unified framework that jointly models structured scene captioning and path planning. To bridge this gap, we propose Wild-Drive, an efficient framework for off-road scene captioning and path planning. Wild-Drive adopts modern multimodal encoders and introduces a task-conditioned modality-routing bridge, MoRo-Former, to adaptively aggregate reliable information under degraded sensing. It then integrates an efficient large language model (LLM), together with a planning token and a gate recurrent unit (GRU) decoder, to generate structured captions and predict future trajectories. We also build the OR-C2P Benchmark, which covers structured off-road scene captioning and path planning under diverse sensor corruption conditions. Experiments on OR-C2P dataset and a self-collected dataset show that Wild-Drive outperforms prior LLM-based methods and remains more stable under degraded sensing. The code and benchmark will be publicly available at https://github.com/wangzihanggg/Wild-Drive.
format Preprint
id arxiv_https___arxiv_org_abs_2603_00694
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Wild-Drive: Off-Road Scene Captioning and Path Planning via Robust Multi-modal Routing and Efficient Large Language Model
Wang, Zihang
Li, Xu
Wang, Benwu
Zhu, Wenkai
Chen, Xieyuanli
Kong, Dong
Lyu, Kailin
Du, Yinan
Peng, Yiming
Che, Haoyang
Robotics
Artificial Intelligence
Explainability and transparent decision-making are essential for the safe deployment of autonomous driving systems. Scene captioning summarizes environmental conditions and risk factors in natural language, improving transparency, safety, and human--robot interaction. However, most existing approaches target structured urban scenarios; in off-road environments, they are vulnerable to single-modality degradations caused by rain, fog, snow, and darkness, and they lack a unified framework that jointly models structured scene captioning and path planning. To bridge this gap, we propose Wild-Drive, an efficient framework for off-road scene captioning and path planning. Wild-Drive adopts modern multimodal encoders and introduces a task-conditioned modality-routing bridge, MoRo-Former, to adaptively aggregate reliable information under degraded sensing. It then integrates an efficient large language model (LLM), together with a planning token and a gate recurrent unit (GRU) decoder, to generate structured captions and predict future trajectories. We also build the OR-C2P Benchmark, which covers structured off-road scene captioning and path planning under diverse sensor corruption conditions. Experiments on OR-C2P dataset and a self-collected dataset show that Wild-Drive outperforms prior LLM-based methods and remains more stable under degraded sensing. The code and benchmark will be publicly available at https://github.com/wangzihanggg/Wild-Drive.
title Wild-Drive: Off-Road Scene Captioning and Path Planning via Robust Multi-modal Routing and Efficient Large Language Model
topic Robotics
Artificial Intelligence
url https://arxiv.org/abs/2603.00694