Saved in:
Bibliographic Details
Main Authors: Wang, Zehan, Wang, Tengfei, Zhang, Haiyu, Zuo, Xuhui, Wu, Junta, Wang, Haoyuan, Sun, Wenqiang, Wang, Zhenwei, Cao, Chenjie, Zhao, Hengshuang, Guo, Chunchao, Zhao, Zhou
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.09022
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912891520679936
author Wang, Zehan
Wang, Tengfei
Zhang, Haiyu
Zuo, Xuhui
Wu, Junta
Wang, Haoyuan
Sun, Wenqiang
Wang, Zhenwei
Cao, Chenjie
Zhao, Hengshuang
Guo, Chunchao
Zhao, Zhou
author_facet Wang, Zehan
Wang, Tengfei
Zhang, Haiyu
Zuo, Xuhui
Wu, Junta
Wang, Haoyuan
Sun, Wenqiang
Wang, Zhenwei
Cao, Chenjie
Zhao, Hengshuang
Guo, Chunchao
Zhao, Zhou
contents This work presents WorldCompass, a novel Reinforcement Learning (RL) post-training framework for the long-horizon, interactive video-based world models, enabling them to explore the world more accurately and consistently based on interaction signals. To effectively "steer" the world model's exploration, we introduce three core innovations tailored to the autoregressive video generation paradigm: 1) Clip-level rollout Strategy: We generate and evaluate multiple samples at a single target clip, which significantly boosts rollout efficiency and provides fine-grained reward signals. 2) Complementary Reward Functions: We design reward functions for both interaction-following accuracy and visual quality, which provide direct supervision and effectively suppress reward-hacking behaviors. 3) Efficient RL Algorithm: We employ the negative-aware fine-tuning strategy coupled with various efficiency optimizations to efficiently and effectively enhance model capacity. Evaluations on the SoTA open-source world model, WorldPlay, demonstrate that WorldCompass significantly improves interaction accuracy and visual fidelity across various scenarios.
format Preprint
id arxiv_https___arxiv_org_abs_2602_09022
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle WorldCompass: Reinforcement Learning for Long-Horizon World Models
Wang, Zehan
Wang, Tengfei
Zhang, Haiyu
Zuo, Xuhui
Wu, Junta
Wang, Haoyuan
Sun, Wenqiang
Wang, Zhenwei
Cao, Chenjie
Zhao, Hengshuang
Guo, Chunchao
Zhao, Zhou
Computer Vision and Pattern Recognition
This work presents WorldCompass, a novel Reinforcement Learning (RL) post-training framework for the long-horizon, interactive video-based world models, enabling them to explore the world more accurately and consistently based on interaction signals. To effectively "steer" the world model's exploration, we introduce three core innovations tailored to the autoregressive video generation paradigm: 1) Clip-level rollout Strategy: We generate and evaluate multiple samples at a single target clip, which significantly boosts rollout efficiency and provides fine-grained reward signals. 2) Complementary Reward Functions: We design reward functions for both interaction-following accuracy and visual quality, which provide direct supervision and effectively suppress reward-hacking behaviors. 3) Efficient RL Algorithm: We employ the negative-aware fine-tuning strategy coupled with various efficiency optimizations to efficiently and effectively enhance model capacity. Evaluations on the SoTA open-source world model, WorldPlay, demonstrate that WorldCompass significantly improves interaction accuracy and visual fidelity across various scenarios.
title WorldCompass: Reinforcement Learning for Long-Horizon World Models
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2602.09022