Saved in:
Bibliographic Details
Main Authors: InSpatio Team, Shen, Donghui, Zhang, Guofeng, Liu, Haomin, Ji, Haoyu, Liu, Jialin, Guo, Jing, Wang, Nan, Pan, Siji, Pan, Weihong, Xie, Weijian, Xiang, Xiaojun, Zhang, Xiaoyu, Liu, Xianbin, Wang, Yifu, Chen, Yipeng, Le, Zhewen, Ye, Zhichao, Zhao, Ziqiang
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.11911
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • We present InSpatio-WorldFM, an open-source real-time frame model for spatial intelligence. Unlike video-based world models that rely on sequential frame generation and incur substantial latency due to window-level processing, InSpatio-WorldFM adopts a frame-based paradigm that generates each frame independently, enabling low-latency real-time spatial inference. By enforcing multi-view spatial consistency through explicit 3D anchors and implicit spatial memory, the model preserves global scene geometry while maintaining fine-grained visual details across viewpoint changes. We further introduce a progressive three-stage training pipeline that transforms a pretrained image diffusion model into a controllable frame model and finally into a real-time generator through few-step distillation. Experimental results show that InSpatio-WorldFM achieves strong multi-view consistency while supporting interactive exploration on consumer-grade GPUs, providing an efficient alternative to traditional video-based world models for real-time world simulation.