Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Fu, Xiao, Tang, Shitao, Shi, Min, Liu, Xian, Gu, Jinwei, Liu, Ming-Yu, Lin, Dahua, Lin, Chen-Hsuan
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.05239
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912810301128704
author	Fu, Xiao Tang, Shitao Shi, Min Liu, Xian Gu, Jinwei Liu, Ming-Yu Lin, Dahua Lin, Chen-Hsuan
author_facet	Fu, Xiao Tang, Shitao Shi, Min Liu, Xian Gu, Jinwei Liu, Ming-Yu Lin, Dahua Lin, Chen-Hsuan
contents	Camera-controlled generative video re-rendering methods, such as ReCamMaster, have achieved remarkable progress. However, despite their success in single-view setting, these works often struggle to maintain consistency across multi-view scenarios. Ensuring spatio-temporal coherence in hallucinated regions remains challenging due to the inherent stochasticity of generative models. To address it, we introduce PlenopticDreamer, a framework that synchronizes generative hallucinations to maintain spatio-temporal memory. The core idea is to train a multi-in-single-out video-conditioned model in an autoregressive manner, aided by a camera-guided video retrieval strategy that adaptively selects salient videos from previous generations as conditional inputs. In addition, Our training incorporates progressive context-scaling to improve convergence, self-conditioning to enhance robustness against long-range visual degradation caused by error accumulation, and a long-video conditioning mechanism to support extended video generation. Extensive experiments on the Basic and Agibot benchmarks demonstrate that PlenopticDreamer achieves state-of-the-art video re-rendering, delivering superior view synchronization, high-fidelity visuals, accurate camera control, and diverse view transformations (e.g., third-person to third-person, and head-view to gripper-view in robotic manipulation). Project page: https://research.nvidia.com/labs/dir/plenopticdreamer/
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_05239
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Plenoptic Video Generation Fu, Xiao Tang, Shitao Shi, Min Liu, Xian Gu, Jinwei Liu, Ming-Yu Lin, Dahua Lin, Chen-Hsuan Computer Vision and Pattern Recognition Camera-controlled generative video re-rendering methods, such as ReCamMaster, have achieved remarkable progress. However, despite their success in single-view setting, these works often struggle to maintain consistency across multi-view scenarios. Ensuring spatio-temporal coherence in hallucinated regions remains challenging due to the inherent stochasticity of generative models. To address it, we introduce PlenopticDreamer, a framework that synchronizes generative hallucinations to maintain spatio-temporal memory. The core idea is to train a multi-in-single-out video-conditioned model in an autoregressive manner, aided by a camera-guided video retrieval strategy that adaptively selects salient videos from previous generations as conditional inputs. In addition, Our training incorporates progressive context-scaling to improve convergence, self-conditioning to enhance robustness against long-range visual degradation caused by error accumulation, and a long-video conditioning mechanism to support extended video generation. Extensive experiments on the Basic and Agibot benchmarks demonstrate that PlenopticDreamer achieves state-of-the-art video re-rendering, delivering superior view synchronization, high-fidelity visuals, accurate camera control, and diverse view transformations (e.g., third-person to third-person, and head-view to gripper-view in robotic manipulation). Project page: https://research.nvidia.com/labs/dir/plenopticdreamer/
title	Plenoptic Video Generation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2601.05239

Similar Items