Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Xu, Linrui, Wang, Zhongan, Shen, Fei, Xu, Gang, Zhuang, Huiping, Li, Ming, Li, Haifeng
Formato:	Preprint
Publicado:	2026
Materias:	Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2603.14941
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866910054130647040
author	Xu, Linrui Wang, Zhongan Shen, Fei Xu, Gang Zhuang, Huiping Li, Ming Li, Haifeng
author_facet	Xu, Linrui Wang, Zhongan Shen, Fei Xu, Gang Zhuang, Huiping Li, Ming Li, Haifeng
contents	Remote sensing world models aim to both explain observed changes and forecast plausible futures, two tasks that share spatiotemporal priors. Existing methods, however, typically address them separately, limiting cross-task transfer. We present RS-WorldModel, a unified world model for remote sensing that jointly handles spatiotemporal change understanding and text-guided future scene forecasting, and we build RSWBench-1.1M, a 1.1 million sample dataset with rich language annotations covering both tasks. RS-WorldModel is trained in three stages: (1) Geo-Aware Generative Pre-training (GAGP) conditions forecasting on geographic and acquisition metadata; (2) synergistic instruction tuning (SIT) jointly trains understanding and forecasting; (3) verifiable reinforcement optimization (VRO) refines outputs with verifiable, task-specific rewards. With only 2B parameters, RS-WorldModel surpasses open-source models up to 120$ \times $ larger on most spatiotemporal change question-answering metrics. It achieves an FID of 43.13 on text-guided future scene forecasting, outperforming all open-source baselines as well as the closed-source Gemini-2.5-Flash Image (Nano Banana).
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_14941
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting Xu, Linrui Wang, Zhongan Shen, Fei Xu, Gang Zhuang, Huiping Li, Ming Li, Haifeng Artificial Intelligence Remote sensing world models aim to both explain observed changes and forecast plausible futures, two tasks that share spatiotemporal priors. Existing methods, however, typically address them separately, limiting cross-task transfer. We present RS-WorldModel, a unified world model for remote sensing that jointly handles spatiotemporal change understanding and text-guided future scene forecasting, and we build RSWBench-1.1M, a 1.1 million sample dataset with rich language annotations covering both tasks. RS-WorldModel is trained in three stages: (1) Geo-Aware Generative Pre-training (GAGP) conditions forecasting on geographic and acquisition metadata; (2) synergistic instruction tuning (SIT) jointly trains understanding and forecasting; (3) verifiable reinforcement optimization (VRO) refines outputs with verifiable, task-specific rewards. With only 2B parameters, RS-WorldModel surpasses open-source models up to 120$ \times $ larger on most spatiotemporal change question-answering metrics. It achieves an FID of 43.13 on text-guided future scene forecasting, outperforming all open-source baselines as well as the closed-source Gemini-2.5-Flash Image (Nano Banana).
title	RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting
topic	Artificial Intelligence
url	https://arxiv.org/abs/2603.14941

Ejemplares similares