Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Bello, Juan Luis Gonzalez, Yao, Xu, Whelan, Alex, Olszewski, Kyle, Kim, Hyeongwoo, Garrido, Pablo
Format:	Preprint
Published:	2025
Subjects:	Image and Video Processing
Online Access:	https://arxiv.org/abs/2504.07146
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910908073115648
author	Bello, Juan Luis Gonzalez Yao, Xu Whelan, Alex Olszewski, Kyle Kim, Hyeongwoo Garrido, Pablo
author_facet	Bello, Juan Luis Gonzalez Yao, Xu Whelan, Alex Olszewski, Kyle Kim, Hyeongwoo Garrido, Pablo
contents	We present an implicit video representation for occlusions, appearance, and motion disentanglement from monocular videos, which we call Video SPatiotemporal Splines (VideoSPatS). Unlike previous methods that map time and coordinates to deformation and canonical colors, our VideoSPatS maps input coordinates into Spatial and Color Spline deformation fields $D_s$ and $D_c$, which disentangle motion and appearance in videos. With spline-based parametrization, our method naturally generates temporally consistent flow and guarantees long-term temporal consistency, which is crucial for convincing video editing. Using multiple prediction branches, our VideoSPatS model also performs layer separation between the latent video and the selected occluder. By disentangling occlusions, appearance, and motion, our method enables better spatiotemporal modeling and editing of diverse videos, including in-the-wild talking head videos with challenging occlusions, shadows, and specularities while maintaining an appropriate canonical space for editing. We also present general video modeling results on the DAVIS and CoDeF datasets, as well as our own talking head video dataset collected from open-source web videos. Extensive ablations show the combination of $D_s$ and $D_c$ under neural splines can overcome motion and appearance ambiguities, paving the way for more advanced video editing models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_07146
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing Bello, Juan Luis Gonzalez Yao, Xu Whelan, Alex Olszewski, Kyle Kim, Hyeongwoo Garrido, Pablo Image and Video Processing We present an implicit video representation for occlusions, appearance, and motion disentanglement from monocular videos, which we call Video SPatiotemporal Splines (VideoSPatS). Unlike previous methods that map time and coordinates to deformation and canonical colors, our VideoSPatS maps input coordinates into Spatial and Color Spline deformation fields $D_s$ and $D_c$, which disentangle motion and appearance in videos. With spline-based parametrization, our method naturally generates temporally consistent flow and guarantees long-term temporal consistency, which is crucial for convincing video editing. Using multiple prediction branches, our VideoSPatS model also performs layer separation between the latent video and the selected occluder. By disentangling occlusions, appearance, and motion, our method enables better spatiotemporal modeling and editing of diverse videos, including in-the-wild talking head videos with challenging occlusions, shadows, and specularities while maintaining an appropriate canonical space for editing. We also present general video modeling results on the DAVIS and CoDeF datasets, as well as our own talking head video dataset collected from open-source web videos. Extensive ablations show the combination of $D_s$ and $D_c$ under neural splines can overcome motion and appearance ambiguities, paving the way for more advanced video editing models.
title	VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing
topic	Image and Video Processing
url	https://arxiv.org/abs/2504.07146

Similar Items