Guardado en:
Detalles Bibliográficos
Autores principales: Xu, Linrui, Wang, Zhongan, Shen, Fei, Xu, Gang, Zhuang, Huiping, Li, Ming, Li, Haifeng
Formato: Preprint
Publicado: 2026
Materias:
Acceso en línea:https://arxiv.org/abs/2603.14941
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866910054130647040
author Xu, Linrui
Wang, Zhongan
Shen, Fei
Xu, Gang
Zhuang, Huiping
Li, Ming
Li, Haifeng
author_facet Xu, Linrui
Wang, Zhongan
Shen, Fei
Xu, Gang
Zhuang, Huiping
Li, Ming
Li, Haifeng
contents Remote sensing world models aim to both explain observed changes and forecast plausible futures, two tasks that share spatiotemporal priors. Existing methods, however, typically address them separately, limiting cross-task transfer. We present RS-WorldModel, a unified world model for remote sensing that jointly handles spatiotemporal change understanding and text-guided future scene forecasting, and we build RSWBench-1.1M, a 1.1 million sample dataset with rich language annotations covering both tasks. RS-WorldModel is trained in three stages: (1) Geo-Aware Generative Pre-training (GAGP) conditions forecasting on geographic and acquisition metadata; (2) synergistic instruction tuning (SIT) jointly trains understanding and forecasting; (3) verifiable reinforcement optimization (VRO) refines outputs with verifiable, task-specific rewards. With only 2B parameters, RS-WorldModel surpasses open-source models up to 120$ \times $ larger on most spatiotemporal change question-answering metrics. It achieves an FID of 43.13 on text-guided future scene forecasting, outperforming all open-source baselines as well as the closed-source Gemini-2.5-Flash Image (Nano Banana).
format Preprint
id arxiv_https___arxiv_org_abs_2603_14941
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting
Xu, Linrui
Wang, Zhongan
Shen, Fei
Xu, Gang
Zhuang, Huiping
Li, Ming
Li, Haifeng
Artificial Intelligence
Remote sensing world models aim to both explain observed changes and forecast plausible futures, two tasks that share spatiotemporal priors. Existing methods, however, typically address them separately, limiting cross-task transfer. We present RS-WorldModel, a unified world model for remote sensing that jointly handles spatiotemporal change understanding and text-guided future scene forecasting, and we build RSWBench-1.1M, a 1.1 million sample dataset with rich language annotations covering both tasks. RS-WorldModel is trained in three stages: (1) Geo-Aware Generative Pre-training (GAGP) conditions forecasting on geographic and acquisition metadata; (2) synergistic instruction tuning (SIT) jointly trains understanding and forecasting; (3) verifiable reinforcement optimization (VRO) refines outputs with verifiable, task-specific rewards. With only 2B parameters, RS-WorldModel surpasses open-source models up to 120$ \times $ larger on most spatiotemporal change question-answering metrics. It achieves an FID of 43.13 on text-guided future scene forecasting, outperforming all open-source baselines as well as the closed-source Gemini-2.5-Flash Image (Nano Banana).
title RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting
topic Artificial Intelligence
url https://arxiv.org/abs/2603.14941