Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.10087 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866908650368401408 |
|---|---|
| author | Huang, Haidong Song, Haiyue Zhu. Jiayu Zhao, Xixin Zhou, Yaohua Zhang, Jiayi Zhai, Yuze Li, Xiaocong |
| author_facet | Huang, Haidong Song, Haiyue Zhu. Jiayu Zhao, Xixin Zhou, Yaohua Zhang, Jiayi Zhai, Yuze Li, Xiaocong |
| contents | Offline-to-online reinforcement learning (O2O-RL) has emerged as a promising paradigm for safe and efficient robotic policy deployment but suffers from two fundamental challenges: limited coverage of multimodal behaviors and distributional shifts during online adaptation. We propose UEPO, a unified generative framework inspired by large language model pretraining and fine-tuning strategies. Our contributions are threefold: (1) a multi-seed dynamics-aware diffusion policy that efficiently captures diverse modalities without training multiple models; (2) a dynamic divergence regularization mechanism that enforces physically meaningful policy diversity; and (3) a diffusion-based data augmentation module that enhances dynamics model generalization. On the D4RL benchmark, UEPO achieves +5.9\% absolute improvement over Uni-O4 on locomotion tasks and +12.4\% on dexterous manipulation, demonstrating strong generalization and scalability. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2511_10087 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Opinion: Towards Unified Expressive Policy Optimization for Robust Robot Learning Huang, Haidong Song, Haiyue Zhu. Jiayu Zhao, Xixin Zhou, Yaohua Zhang, Jiayi Zhai, Yuze Li, Xiaocong Robotics Artificial Intelligence Machine Learning 68T05 I.2.8; I.2.9 Offline-to-online reinforcement learning (O2O-RL) has emerged as a promising paradigm for safe and efficient robotic policy deployment but suffers from two fundamental challenges: limited coverage of multimodal behaviors and distributional shifts during online adaptation. We propose UEPO, a unified generative framework inspired by large language model pretraining and fine-tuning strategies. Our contributions are threefold: (1) a multi-seed dynamics-aware diffusion policy that efficiently captures diverse modalities without training multiple models; (2) a dynamic divergence regularization mechanism that enforces physically meaningful policy diversity; and (3) a diffusion-based data augmentation module that enhances dynamics model generalization. On the D4RL benchmark, UEPO achieves +5.9\% absolute improvement over Uni-O4 on locomotion tasks and +12.4\% on dexterous manipulation, demonstrating strong generalization and scalability. |
| title | Opinion: Towards Unified Expressive Policy Optimization for Robust Robot Learning |
| topic | Robotics Artificial Intelligence Machine Learning 68T05 I.2.8; I.2.9 |
| url | https://arxiv.org/abs/2511.10087 |