Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Zehan, Wang, Tengfei, Zhang, Haiyu, Zuo, Xuhui, Wu, Junta, Wang, Haoyuan, Sun, Wenqiang, Wang, Zhenwei, Cao, Chenjie, Zhao, Hengshuang, Guo, Chunchao, Zhao, Zhou
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.09022
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912891520679936
author	Wang, Zehan Wang, Tengfei Zhang, Haiyu Zuo, Xuhui Wu, Junta Wang, Haoyuan Sun, Wenqiang Wang, Zhenwei Cao, Chenjie Zhao, Hengshuang Guo, Chunchao Zhao, Zhou
author_facet	Wang, Zehan Wang, Tengfei Zhang, Haiyu Zuo, Xuhui Wu, Junta Wang, Haoyuan Sun, Wenqiang Wang, Zhenwei Cao, Chenjie Zhao, Hengshuang Guo, Chunchao Zhao, Zhou
contents	This work presents WorldCompass, a novel Reinforcement Learning (RL) post-training framework for the long-horizon, interactive video-based world models, enabling them to explore the world more accurately and consistently based on interaction signals. To effectively "steer" the world model's exploration, we introduce three core innovations tailored to the autoregressive video generation paradigm: 1) Clip-level rollout Strategy: We generate and evaluate multiple samples at a single target clip, which significantly boosts rollout efficiency and provides fine-grained reward signals. 2) Complementary Reward Functions: We design reward functions for both interaction-following accuracy and visual quality, which provide direct supervision and effectively suppress reward-hacking behaviors. 3) Efficient RL Algorithm: We employ the negative-aware fine-tuning strategy coupled with various efficiency optimizations to efficiently and effectively enhance model capacity. Evaluations on the SoTA open-source world model, WorldPlay, demonstrate that WorldCompass significantly improves interaction accuracy and visual fidelity across various scenarios.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_09022
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	WorldCompass: Reinforcement Learning for Long-Horizon World Models Wang, Zehan Wang, Tengfei Zhang, Haiyu Zuo, Xuhui Wu, Junta Wang, Haoyuan Sun, Wenqiang Wang, Zhenwei Cao, Chenjie Zhao, Hengshuang Guo, Chunchao Zhao, Zhou Computer Vision and Pattern Recognition This work presents WorldCompass, a novel Reinforcement Learning (RL) post-training framework for the long-horizon, interactive video-based world models, enabling them to explore the world more accurately and consistently based on interaction signals. To effectively "steer" the world model's exploration, we introduce three core innovations tailored to the autoregressive video generation paradigm: 1) Clip-level rollout Strategy: We generate and evaluate multiple samples at a single target clip, which significantly boosts rollout efficiency and provides fine-grained reward signals. 2) Complementary Reward Functions: We design reward functions for both interaction-following accuracy and visual quality, which provide direct supervision and effectively suppress reward-hacking behaviors. 3) Efficient RL Algorithm: We employ the negative-aware fine-tuning strategy coupled with various efficiency optimizations to efficiently and effectively enhance model capacity. Evaluations on the SoTA open-source world model, WorldPlay, demonstrate that WorldCompass significantly improves interaction accuracy and visual fidelity across various scenarios.
title	WorldCompass: Reinforcement Learning for Long-Horizon World Models
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.09022

Similar Items