Saved in:
Bibliographic Details
Main Authors: Yuan, Yu, Yuan, Jianhao, Wang, Xijun, Li, Daiqing, He, Liu, Ling, Lu, Chan, Stanley H.
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2606.00499
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Video generation models are becoming a scalable form of world models, but they mainly generate plausible motion rather than proactively control or optimize the underlying dynamics. As a result, an object in the generated video may follow trajectories that are unsafe, not smooth, inefficient, or physically inconsistent. In this work, we propose \textbf{OptiWorld}, a framework that brings classical optimal control into video generation at inference time. OptiWorld first extracts a compact, task-relevant world state, then plans an optimal trajectory under physical constraints, and finally renders the video conditioned on this trajectory. We formulate planning as a geometric problem on a continuous manifold, which converts 3D geometry and task-dependent physical constraints into a unified planning geometry. By adding this optimal-control layer, OptiWorld generates videos with preferable dynamics, demonstrating strong potential in multiple tasks including goal-conditioned image-to-video generation, video dynamics editing, and counterfactual generation.