Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lee, Yao-Chih, Lu, Erika, Rumbley, Sarah, Geyer, Michal, Huang, Jia-Bin, Dekel, Tali, Cole, Forrester
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2411.16683
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913752923766784
author	Lee, Yao-Chih Lu, Erika Rumbley, Sarah Geyer, Michal Huang, Jia-Bin Dekel, Tali Cole, Forrester
author_facet	Lee, Yao-Chih Lu, Erika Rumbley, Sarah Geyer, Michal Huang, Jia-Bin Dekel, Tali Cole, Forrester
contents	Given a video and a set of input object masks, an omnimatte method aims to decompose the video into semantically meaningful layers containing individual objects along with their associated effects, such as shadows and reflections. Existing omnimatte methods assume a static background or accurate pose and depth estimation and produce poor decompositions when these assumptions are violated. Furthermore, due to the lack of generative prior on natural videos, existing methods cannot complete dynamic occluded regions. We present a novel generative layered video decomposition framework to address the omnimatte problem. Our method does not assume a stationary scene or require camera pose or depth information and produces clean, complete layers, including convincing completions of occluded dynamic regions. Our core idea is to train a video diffusion model to identify and remove scene effects caused by a specific object. We show that this model can be finetuned from an existing video inpainting model with a small, carefully curated dataset, and demonstrate high-quality decompositions and editing results for a wide range of casually captured videos containing soft shadows, glossy reflections, splashing water, and more.
format	Preprint
id	arxiv_https___arxiv_org_abs_2411_16683
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Generative Omnimatte: Learning to Decompose Video into Layers Lee, Yao-Chih Lu, Erika Rumbley, Sarah Geyer, Michal Huang, Jia-Bin Dekel, Tali Cole, Forrester Computer Vision and Pattern Recognition Given a video and a set of input object masks, an omnimatte method aims to decompose the video into semantically meaningful layers containing individual objects along with their associated effects, such as shadows and reflections. Existing omnimatte methods assume a static background or accurate pose and depth estimation and produce poor decompositions when these assumptions are violated. Furthermore, due to the lack of generative prior on natural videos, existing methods cannot complete dynamic occluded regions. We present a novel generative layered video decomposition framework to address the omnimatte problem. Our method does not assume a stationary scene or require camera pose or depth information and produces clean, complete layers, including convincing completions of occluded dynamic regions. Our core idea is to train a video diffusion model to identify and remove scene effects caused by a specific object. We show that this model can be finetuned from an existing video inpainting model with a small, carefully curated dataset, and demonstrate high-quality decompositions and editing results for a wide range of casually captured videos containing soft shadows, glossy reflections, splashing water, and more.
title	Generative Omnimatte: Learning to Decompose Video into Layers
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2411.16683

Similar Items