Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Qisen, Zhao, Yifan, Shen, Peisen, Li, Jialu, Li, Jia
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.01481
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908683889278976
author	Wang, Qisen Zhao, Yifan Shen, Peisen Li, Jialu Li, Jia
author_facet	Wang, Qisen Zhao, Yifan Shen, Peisen Li, Jialu Li, Jia
contents	Although prevailing camera-controlled video generation models can produce cinematic results, lifting them directly to the generation of 3D-consistent and high-fidelity time-synchronized multi-view videos remains challenging, which is a pivotal capability for taming 4D worlds. Some works resort to data augmentation or test-time optimization, but these strategies are constrained by limited model generalization and scalability issues. To this end, we propose ChronosObserver, a training-free method including World State Hyperspace to represent the spatiotemporal constraints of a 4D world scene, and Hyperspace Guided Sampling to synchronize the diffusion sampling trajectories of multiple views using the hyperspace. Experimental results demonstrate that our method achieves high-fidelity and 3D-consistent time-synchronized multi-view videos generation without training or fine-tuning for diffusion models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_01481
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	ChronosObserver: Taming 4D World with Hyperspace Diffusion Sampling Wang, Qisen Zhao, Yifan Shen, Peisen Li, Jialu Li, Jia Computer Vision and Pattern Recognition Although prevailing camera-controlled video generation models can produce cinematic results, lifting them directly to the generation of 3D-consistent and high-fidelity time-synchronized multi-view videos remains challenging, which is a pivotal capability for taming 4D worlds. Some works resort to data augmentation or test-time optimization, but these strategies are constrained by limited model generalization and scalability issues. To this end, we propose ChronosObserver, a training-free method including World State Hyperspace to represent the spatiotemporal constraints of a 4D world scene, and Hyperspace Guided Sampling to synchronize the diffusion sampling trajectories of multiple views using the hyperspace. Experimental results demonstrate that our method achieves high-fidelity and 3D-consistent time-synchronized multi-view videos generation without training or fine-tuning for diffusion models.
title	ChronosObserver: Taming 4D World with Hyperspace Diffusion Sampling
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2512.01481

Similar Items