Saved in:
Bibliographic Details
Main Authors: Wang, Qisen, Zhao, Yifan, Li, Jia
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.11845
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918333780066304
author Wang, Qisen
Zhao, Yifan
Li, Jia
author_facet Wang, Qisen
Zhao, Yifan
Li, Jia
contents Dynamic reconstruction has achieved remarkable progress, but there remain challenges in monocular input for more practical applications. The prevailing works attempt to construct efficient motion representations, but lack a unified spatiotemporal decomposition framework, suffering from either holistic temporal optimization or coupled hierarchical spatial composition. To this end, we propose WorldTree, a unified framework comprising Temporal Partition Tree (TPT) that enables coarse-to-fine optimization based on the inheritance-based partition tree structure for hierarchical temporal decomposition, and Spatial Ancestral Chains (SAC) that recursively query ancestral hierarchical structure to provide complementary spatial dynamics while specializing motion representations across ancestral nodes. Experimental results on different datasets indicate that our proposed method achieves 8.26% improvement of LPIPS on NVIDIA-LS and 9.09% improvement of mLPIPS on DyCheck compared to the second-best method. Code: https://github.com/iCVTEAM/WorldTree.
format Preprint
id arxiv_https___arxiv_org_abs_2602_11845
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle WorldTree: Towards 4D Dynamic Worlds from Monocular Video using Tree-Chains
Wang, Qisen
Zhao, Yifan
Li, Jia
Computer Vision and Pattern Recognition
Dynamic reconstruction has achieved remarkable progress, but there remain challenges in monocular input for more practical applications. The prevailing works attempt to construct efficient motion representations, but lack a unified spatiotemporal decomposition framework, suffering from either holistic temporal optimization or coupled hierarchical spatial composition. To this end, we propose WorldTree, a unified framework comprising Temporal Partition Tree (TPT) that enables coarse-to-fine optimization based on the inheritance-based partition tree structure for hierarchical temporal decomposition, and Spatial Ancestral Chains (SAC) that recursively query ancestral hierarchical structure to provide complementary spatial dynamics while specializing motion representations across ancestral nodes. Experimental results on different datasets indicate that our proposed method achieves 8.26% improvement of LPIPS on NVIDIA-LS and 9.09% improvement of mLPIPS on DyCheck compared to the second-best method. Code: https://github.com/iCVTEAM/WorldTree.
title WorldTree: Towards 4D Dynamic Worlds from Monocular Video using Tree-Chains
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2602.11845