Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Honda, Kohei, Hosogaya, Hirotaka
Format:	Preprint
Published:	2026
Subjects:	Robotics
Online Access:	https://arxiv.org/abs/2604.07672
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917394379702272
author	Honda, Kohei Hosogaya, Hirotaka
author_facet	Honda, Kohei Hosogaya, Hirotaka
contents	This paper presents an empirical study of reset-free reinforcement learning (RL) for real-world agile driving, in which a physical 1/10-scale vehicle learns continuously on a slippery indoor track without manual resets. High-speed driving near the limits of tire friction is particularly challenging for learning-based methods because complex vehicle dynamics, actuation delays, and other unmodeled effects hinder both accurate simulation and direct sim-to-real transfer of learned policies. To enable autonomous training on a physical platform, we employ Model Predictive Path Integral control (MPPI) as both the reset policy and the base policy for residual learning, and systematically compare three representative RL algorithms, i.e., PPO, SAC, and TD-MPC2, with and without residual learning in simulation and real-world experiments. Our results reveal a clear gap between simulation and real-world: SAC with residual learning achieves the highest returns in simulation, yet only TD-MPC2 consistently outperforms the MPPI baseline on the physical platform. Moreover, residual learning, while clearly beneficial in simulation, fails to transfer its advantage to the real world and can even degrade performance. These findings reveal that reset-free RL in the real world poses unique challenges absent from simulation, calling for further algorithmic development tailored to training in the wild.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_07672
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Reset-Free Reinforcement Learning for Real-World Agile Driving: An Empirical Study Honda, Kohei Hosogaya, Hirotaka Robotics This paper presents an empirical study of reset-free reinforcement learning (RL) for real-world agile driving, in which a physical 1/10-scale vehicle learns continuously on a slippery indoor track without manual resets. High-speed driving near the limits of tire friction is particularly challenging for learning-based methods because complex vehicle dynamics, actuation delays, and other unmodeled effects hinder both accurate simulation and direct sim-to-real transfer of learned policies. To enable autonomous training on a physical platform, we employ Model Predictive Path Integral control (MPPI) as both the reset policy and the base policy for residual learning, and systematically compare three representative RL algorithms, i.e., PPO, SAC, and TD-MPC2, with and without residual learning in simulation and real-world experiments. Our results reveal a clear gap between simulation and real-world: SAC with residual learning achieves the highest returns in simulation, yet only TD-MPC2 consistently outperforms the MPPI baseline on the physical platform. Moreover, residual learning, while clearly beneficial in simulation, fails to transfer its advantage to the real world and can even degrade performance. These findings reveal that reset-free RL in the real world poses unique challenges absent from simulation, calling for further algorithmic development tailored to training in the wild.
title	Reset-Free Reinforcement Learning for Real-World Agile Driving: An Empirical Study
topic	Robotics
url	https://arxiv.org/abs/2604.07672

Similar Items