Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Gu, Chenghao, Kang, Haolan, Lin, Junchao, Wang, Jinghe, Wu, Duo, Xie, Shuzhao, Huang, Fanding, Ge, Junchen, Gong, Ziyang, Li, Letian, Zheng, Hongying, Lv, Changwei, Wang, Zhi
Format:	Preprint
Published:	2025
Subjects:	Robotics
Online Access:	https://arxiv.org/abs/2512.01773
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915938400468992
author	Gu, Chenghao Kang, Haolan Lin, Junchao Wang, Jinghe Wu, Duo Xie, Shuzhao Huang, Fanding Ge, Junchen Gong, Ziyang Li, Letian Zheng, Hongying Lv, Changwei Wang, Zhi
author_facet	Gu, Chenghao Kang, Haolan Lin, Junchao Wang, Jinghe Wu, Duo Xie, Shuzhao Huang, Fanding Ge, Junchen Gong, Ziyang Li, Letian Zheng, Hongying Lv, Changwei Wang, Zhi
contents	The rise of generalist robotic policies has created an exponential demand for large-scale training data. However, on-robot data collection is labor-intensive and often limited to specific environments. In contrast, open-world images capture a vast diversity of real-world scenes that naturally align with robotic manipulation tasks, offering a promising avenue for low-cost, large-scale robot data acquisition. Despite this potential, the lack of associated robot actions hinders the practical use of open-world images for robot learning, leaving this rich visual resource largely unexploited. To bridge this gap, we propose IGen, a framework that scalably generates realistic visual observations and executable actions from open-world images. IGen first converts unstructured 2D pixels into structured 3D scene representations suitable for scene understanding and manipulation. It then leverages the reasoning capabilities of vision-language models to transform scene-specific task instructions into high-level plans and generate low-level actions as SE(3) end-effector pose sequences. From these poses, it synthesizes dynamic scene evolution and renders temporally coherent visual observations. Experiments validate the high quality of visuomotor data generated by IGen, and show that policies trained solely on IGen-synthesized data achieve performance comparable to those trained on real-world data. This highlights the potential of IGen to support scalable data generation from open-world images for generalist robotic policy training.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_01773
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	IGen: Scalable Data Generation for Robot Learning from Open-World Images Gu, Chenghao Kang, Haolan Lin, Junchao Wang, Jinghe Wu, Duo Xie, Shuzhao Huang, Fanding Ge, Junchen Gong, Ziyang Li, Letian Zheng, Hongying Lv, Changwei Wang, Zhi Robotics The rise of generalist robotic policies has created an exponential demand for large-scale training data. However, on-robot data collection is labor-intensive and often limited to specific environments. In contrast, open-world images capture a vast diversity of real-world scenes that naturally align with robotic manipulation tasks, offering a promising avenue for low-cost, large-scale robot data acquisition. Despite this potential, the lack of associated robot actions hinders the practical use of open-world images for robot learning, leaving this rich visual resource largely unexploited. To bridge this gap, we propose IGen, a framework that scalably generates realistic visual observations and executable actions from open-world images. IGen first converts unstructured 2D pixels into structured 3D scene representations suitable for scene understanding and manipulation. It then leverages the reasoning capabilities of vision-language models to transform scene-specific task instructions into high-level plans and generate low-level actions as SE(3) end-effector pose sequences. From these poses, it synthesizes dynamic scene evolution and renders temporally coherent visual observations. Experiments validate the high quality of visuomotor data generated by IGen, and show that policies trained solely on IGen-synthesized data achieve performance comparable to those trained on real-world data. This highlights the potential of IGen to support scalable data generation from open-world images for generalist robotic policy training.
title	IGen: Scalable Data Generation for Robot Learning from Open-World Images
topic	Robotics
url	https://arxiv.org/abs/2512.01773

Similar Items