Saved in:
Bibliographic Details
Main Authors: Yin, Tenny, Mei, Zhiting, Zheng, Zhonghe, Yamane, Miyu, Wang, David, Sceats, Jade, Bateman, Samuel M., Zha, Lihan, Badithela, Apurva, Shorinwa, Ola, Majumdar, Anirudha
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.09030
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Action-conditioned video models offer a promising path to building general-purpose robot simulators that can improve directly from data. Yet, despite training on large-scale robot datasets, current state-of-the-art video models still struggle to predict physically consistent robot-object interactions that are crucial in robotic manipulation. To close this gap, we present PlayWorld, a simple, scalable, and fully autonomous pipeline for training high-fidelity video world simulators from interaction experience. In contrast to prior approaches that rely on success-biased human demonstrations, PlayWorld is the first system capable of learning entirely from unsupervised robot self-play, enabling naturally scalable data collection while capturing complex, long-tailed physical interactions essential for modeling realistic object dynamics. Experiments across diverse manipulation tasks show that PlayWorld generates high-quality, physically consistent predictions for contact-rich interactions that are not captured by world models trained on human-collected data. We further demonstrate the versatility of PlayWorld in enabling fine-grained failure prediction and policy evaluation, with up to 40% improvements over human-collected data. Finally, we demonstrate how PlayWorld enables reinforcement learning in the world model, improving policy performance by 65% in success rates when deployed in the real world.