Saved in:
Bibliographic Details
Main Authors: Guo, Ziang, Min, Chen, Zhang, Xuefeng, Zhou, Yixiao, Wang, Shuo, Zheng, Sifa, Tsetserukou, Dzmitry, Zhang, Zufeng
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.28111
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914568619425792
author Guo, Ziang
Min, Chen
Zhang, Xuefeng
Zhou, Yixiao
Wang, Shuo
Zheng, Sifa
Tsetserukou, Dzmitry
Zhang, Zufeng
author_facet Guo, Ziang
Min, Chen
Zhang, Xuefeng
Zhou, Yixiao
Wang, Shuo
Zheng, Sifa
Tsetserukou, Dzmitry
Zhang, Zufeng
contents End-to-end (E2E) autonomous driving aims to directly map sensory observations to driving actions, but its real-world deployment is hindered by evolving data distributions and the high cost of continual annotation. While combining imitation learning (IL) and reinforcement learning (RL) is a common strategy for policy improvement, conventional RL training relies on delayed, event-based rewards, where policies learn only from catastrophic outcomes such as collisions, leading to premature convergence to suboptimal behaviors. To address these limitations, we propose GSDrive, a framework that uses a differentiable 3D Gaussian Splatting (3DGS) environment for future-aware trajectory probing and reward shaping in E2E driving. GSDrive first learns a multi-mode trajectory probe via IL and then uses RL to evaluate multiple candidate futures in the 3DGS environment, converting their simulated returns into dense shaping rewards for policy optimization. This yields a cyclic hybrid IL-RL training loop, where IL supplies structured future priors and RL provides interactive feedback for iterative refinement. Evaluated on the reconstructed nuScenes dataset, our method outperforms other simulation-based RL approaches in closed-loop experiments. Code is available at https://github.com/ZionGo6/GSDrive.
format Preprint
id arxiv_https___arxiv_org_abs_2604_28111
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle GSDrive: Reinforcing Driving Policies by Multi-mode Future Trajectory Probing with 3D Gaussian Splatting Environment
Guo, Ziang
Min, Chen
Zhang, Xuefeng
Zhou, Yixiao
Wang, Shuo
Zheng, Sifa
Tsetserukou, Dzmitry
Zhang, Zufeng
Robotics
End-to-end (E2E) autonomous driving aims to directly map sensory observations to driving actions, but its real-world deployment is hindered by evolving data distributions and the high cost of continual annotation. While combining imitation learning (IL) and reinforcement learning (RL) is a common strategy for policy improvement, conventional RL training relies on delayed, event-based rewards, where policies learn only from catastrophic outcomes such as collisions, leading to premature convergence to suboptimal behaviors. To address these limitations, we propose GSDrive, a framework that uses a differentiable 3D Gaussian Splatting (3DGS) environment for future-aware trajectory probing and reward shaping in E2E driving. GSDrive first learns a multi-mode trajectory probe via IL and then uses RL to evaluate multiple candidate futures in the 3DGS environment, converting their simulated returns into dense shaping rewards for policy optimization. This yields a cyclic hybrid IL-RL training loop, where IL supplies structured future priors and RL provides interactive feedback for iterative refinement. Evaluated on the reconstructed nuScenes dataset, our method outperforms other simulation-based RL approaches in closed-loop experiments. Code is available at https://github.com/ZionGo6/GSDrive.
title GSDrive: Reinforcing Driving Policies by Multi-mode Future Trajectory Probing with 3D Gaussian Splatting Environment
topic Robotics
url https://arxiv.org/abs/2604.28111