Saved in:
Bibliographic Details
Main Authors: Zhao, Chang, Yang, Zheming, Hu, Yunqing, Guo, Qi, Wang, Zijian, Li, Pengcheng, Ji, Wen
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.04714
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918277914034176
author Zhao, Chang
Yang, Zheming
Hu, Yunqing
Guo, Qi
Wang, Zijian
Li, Pengcheng
Ji, Wen
author_facet Zhao, Chang
Yang, Zheming
Hu, Yunqing
Guo, Qi
Wang, Zijian
Li, Pengcheng
Ji, Wen
contents With the rapid advancement of large language models (LLMs) technologies, their application in the domain of autonomous driving has become increasingly widespread. However, existing methods suffer from unstructured reasoning, poor generalization, and misalignment with human driving intent. While Chain-of-Thought (CoT) reasoning enhances decision transparency, conventional supervised fine-tuning (SFT) fails to fully exploit its potential, and reinforcement learning (RL) approaches face instability and suboptimal reasoning depth. We propose ThinkDrive, a CoT guided progressive RL fine-tuning framework for autonomous driving that synergizes explicit reasoning with difficulty-aware adaptive policy optimization. Our method employs a two-stage training strategy. First, we perform SFT using CoT explanations. Then, we apply progressive RL with a difficulty-aware adaptive policy optimizer that dynamically adjusts learning intensity based on sample complexity. We evaluate our approach on a public dataset. The results show that ThinkDrive outperforms strong RL baselines by 1.45%, 1.95%, and 1.01% on exam, easy-exam, and accuracy, respectively. Moreover, a 2B-parameter model trained with our method surpasses the much larger GPT-4o by 3.28% on the exam metric.
format Preprint
id arxiv_https___arxiv_org_abs_2601_04714
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle ThinkDrive: Chain-of-Thought Guided Progressive Reinforcement Learning Fine-Tuning for Autonomous Driving
Zhao, Chang
Yang, Zheming
Hu, Yunqing
Guo, Qi
Wang, Zijian
Li, Pengcheng
Ji, Wen
Artificial Intelligence
With the rapid advancement of large language models (LLMs) technologies, their application in the domain of autonomous driving has become increasingly widespread. However, existing methods suffer from unstructured reasoning, poor generalization, and misalignment with human driving intent. While Chain-of-Thought (CoT) reasoning enhances decision transparency, conventional supervised fine-tuning (SFT) fails to fully exploit its potential, and reinforcement learning (RL) approaches face instability and suboptimal reasoning depth. We propose ThinkDrive, a CoT guided progressive RL fine-tuning framework for autonomous driving that synergizes explicit reasoning with difficulty-aware adaptive policy optimization. Our method employs a two-stage training strategy. First, we perform SFT using CoT explanations. Then, we apply progressive RL with a difficulty-aware adaptive policy optimizer that dynamically adjusts learning intensity based on sample complexity. We evaluate our approach on a public dataset. The results show that ThinkDrive outperforms strong RL baselines by 1.45%, 1.95%, and 1.01% on exam, easy-exam, and accuracy, respectively. Moreover, a 2B-parameter model trained with our method surpasses the much larger GPT-4o by 3.28% on the exam metric.
title ThinkDrive: Chain-of-Thought Guided Progressive Reinforcement Learning Fine-Tuning for Autonomous Driving
topic Artificial Intelligence
url https://arxiv.org/abs/2601.04714