Saved in:
Bibliographic Details
Main Authors: Yu, Kelin, Zhang, Sheng, Soora, Harshit, Huang, Furong, Huang, Heng, Tokekar, Pratap, Gao, Ruohan
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.11049
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916901180932096
author Yu, Kelin
Zhang, Sheng
Soora, Harshit
Huang, Furong
Huang, Heng
Tokekar, Pratap
Gao, Ruohan
author_facet Yu, Kelin
Zhang, Sheng
Soora, Harshit
Huang, Furong
Huang, Heng
Tokekar, Pratap
Gao, Ruohan
contents Recent advances have shown that video generation models can enhance robot learning by deriving effective robot actions through inverse dynamics. However, these methods heavily depend on the quality of generated data and struggle with fine-grained manipulation due to the lack of environment feedback. While video-based reinforcement learning improves policy robustness, it remains constrained by the uncertainty of video generation and the challenges of collecting large-scale robot datasets for training diffusion models. To address these limitations, we propose GenFlowRL, which derives shaped rewards from generated flow trained from diverse cross-embodiment datasets. This enables learning generalizable and robust policies from diverse demonstrations using low-dimensional, object-centric features. Experiments on 10 manipulation tasks, both in simulation and real-world cross-embodiment evaluations, demonstrate that GenFlowRL effectively leverages manipulation features extracted from generated object-centric flow, consistently achieving superior performance across diverse and challenging scenarios. Our Project Page: https://colinyu1.github.io/genflowrl
format Preprint
id arxiv_https___arxiv_org_abs_2508_11049
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning
Yu, Kelin
Zhang, Sheng
Soora, Harshit
Huang, Furong
Huang, Heng
Tokekar, Pratap
Gao, Ruohan
Robotics
Computer Vision and Pattern Recognition
Recent advances have shown that video generation models can enhance robot learning by deriving effective robot actions through inverse dynamics. However, these methods heavily depend on the quality of generated data and struggle with fine-grained manipulation due to the lack of environment feedback. While video-based reinforcement learning improves policy robustness, it remains constrained by the uncertainty of video generation and the challenges of collecting large-scale robot datasets for training diffusion models. To address these limitations, we propose GenFlowRL, which derives shaped rewards from generated flow trained from diverse cross-embodiment datasets. This enables learning generalizable and robust policies from diverse demonstrations using low-dimensional, object-centric features. Experiments on 10 manipulation tasks, both in simulation and real-world cross-embodiment evaluations, demonstrate that GenFlowRL effectively leverages manipulation features extracted from generated object-centric flow, consistently achieving superior performance across diverse and challenging scenarios. Our Project Page: https://colinyu1.github.io/genflowrl
title GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning
topic Robotics
Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2508.11049