Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Fang, Zixun, Zhu, Kai, Liu, Zhiheng, Liu, Yu, Zhai, Wei, Cao, Yang, Zha, Zheng-Jun
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.23513
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912457286483968
author	Fang, Zixun Zhu, Kai Liu, Zhiheng Liu, Yu Zhai, Wei Cao, Yang Zha, Zheng-Jun
author_facet	Fang, Zixun Zhu, Kai Liu, Zhiheng Liu, Yu Zhai, Wei Cao, Yang Zha, Zheng-Jun
contents	Panoramic video generation aims to synthesize 360-degree immersive videos, holding significant importance in the fields of VR, world models, and spatial intelligence. Existing works fail to synthesize high-quality panoramic videos due to the inherent modality gap between panoramic data and perspective data, which constitutes the majority of the training data for modern diffusion models. In this paper, we propose a novel framework utilizing pretrained perspective video models for generating panoramic videos. Specifically, we design a novel panorama representation named ViewPoint map, which possesses global spatial continuity and fine-grained visual details simultaneously. With our proposed Pano-Perspective attention mechanism, the model benefits from pretrained perspective priors and captures the panoramic spatial correlations of the ViewPoint map effectively. Extensive experiments demonstrate that our method can synthesize highly dynamic and spatially consistent panoramic videos, achieving state-of-the-art performance and surpassing previous methods.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_23513
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models Fang, Zixun Zhu, Kai Liu, Zhiheng Liu, Yu Zhai, Wei Cao, Yang Zha, Zheng-Jun Computer Vision and Pattern Recognition Panoramic video generation aims to synthesize 360-degree immersive videos, holding significant importance in the fields of VR, world models, and spatial intelligence. Existing works fail to synthesize high-quality panoramic videos due to the inherent modality gap between panoramic data and perspective data, which constitutes the majority of the training data for modern diffusion models. In this paper, we propose a novel framework utilizing pretrained perspective video models for generating panoramic videos. Specifically, we design a novel panorama representation named ViewPoint map, which possesses global spatial continuity and fine-grained visual details simultaneously. With our proposed Pano-Perspective attention mechanism, the model benefits from pretrained perspective priors and captures the panoramic spatial correlations of the ViewPoint map effectively. Extensive experiments demonstrate that our method can synthesize highly dynamic and spatially consistent panoramic videos, achieving state-of-the-art performance and surpassing previous methods.
title	ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2506.23513

Similar Items