Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Qi, Di, Yang, Tong, Wang, Beining, Zhang, Xiangyu, Zhang, Wenqiang
Format:	Preprint
Publié:	2025
Sujets:	Computer Vision and Pattern Recognition
Accès en ligne:	https://arxiv.org/abs/2501.16617
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866912206943158272
author	Qi, Di Yang, Tong Wang, Beining Zhang, Xiangyu Zhang, Wenqiang
author_facet	Qi, Di Yang, Tong Wang, Beining Zhang, Xiangyu Zhang, Wenqiang
contents	We present a novel framework for dynamic radiance field prediction given monocular video streams. Unlike previous methods that primarily focus on predicting future frames, our method goes a step further by generating explicit 3D representations of the dynamic scene. The framework builds on two core designs. First, we adopt an ego-centric unbounded triplane to explicitly represent the dynamic physical world. Second, we develop a 4D-aware transformer to aggregate features from monocular videos to update the triplane. Coupling these two designs enables us to train the proposed model with large-scale monocular videos in a self-supervised manner. Our model achieves top results in dynamic radiance field prediction on NVIDIA dynamic scenes, demonstrating its strong performance on 4D physical world modeling. Besides, our model shows a superior generalizability to unseen scenarios. Notably, we find that our approach emerges capabilities for geometry and semantic learning.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_16617
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Predicting 3D representations for Dynamic Scenes Qi, Di Yang, Tong Wang, Beining Zhang, Xiangyu Zhang, Wenqiang Computer Vision and Pattern Recognition We present a novel framework for dynamic radiance field prediction given monocular video streams. Unlike previous methods that primarily focus on predicting future frames, our method goes a step further by generating explicit 3D representations of the dynamic scene. The framework builds on two core designs. First, we adopt an ego-centric unbounded triplane to explicitly represent the dynamic physical world. Second, we develop a 4D-aware transformer to aggregate features from monocular videos to update the triplane. Coupling these two designs enables us to train the proposed model with large-scale monocular videos in a self-supervised manner. Our model achieves top results in dynamic radiance field prediction on NVIDIA dynamic scenes, demonstrating its strong performance on 4D physical world modeling. Besides, our model shows a superior generalizability to unseen scenarios. Notably, we find that our approach emerges capabilities for geometry and semantic learning.
title	Predicting 3D representations for Dynamic Scenes
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2501.16617

Documents similaires