Salvato in:
Dettagli Bibliografici
Autori principali: Wan, Zhen, Qi, Chenyang, Liu, Zhiheng, Gui, Tao, Ma, Yue
Natura: Preprint
Pubblicazione: 2024
Soggetti:
Accesso online:https://arxiv.org/abs/2412.06340
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866911080506195968
author Wan, Zhen
Qi, Chenyang
Liu, Zhiheng
Gui, Tao
Ma, Yue
author_facet Wan, Zhen
Qi, Chenyang
Liu, Zhiheng
Gui, Tao
Ma, Yue
contents In this paper, we present UniPaint, a unified generative space-time video inpainting framework that enables spatial-temporal inpainting and interpolation. Different from existing methods that treat video inpainting and video interpolation as two distinct tasks, we leverage a unified inpainting framework to tackle them and observe that these two tasks can mutually enhance synthesis performance. Specifically, we first introduce a plug-and-play space-time video inpainting adapter, which can be employed in various personalized models. The key insight is to propose a Mixture of Experts (MoE) attention to cover various tasks. Then, we design a spatial-temporal masking strategy during the training stage to mutually enhance each other and improve performance. UniPaint produces high-quality and aesthetically pleasing results, achieving the best quantitative results across various tasks and scale setups. The code and checkpoints are available at $\href{https://github.com/mmmmm-w/UniPaint}{this \ repository}$.
format Preprint
id arxiv_https___arxiv_org_abs_2412_06340
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts
Wan, Zhen
Qi, Chenyang
Liu, Zhiheng
Gui, Tao
Ma, Yue
Computer Vision and Pattern Recognition
In this paper, we present UniPaint, a unified generative space-time video inpainting framework that enables spatial-temporal inpainting and interpolation. Different from existing methods that treat video inpainting and video interpolation as two distinct tasks, we leverage a unified inpainting framework to tackle them and observe that these two tasks can mutually enhance synthesis performance. Specifically, we first introduce a plug-and-play space-time video inpainting adapter, which can be employed in various personalized models. The key insight is to propose a Mixture of Experts (MoE) attention to cover various tasks. Then, we design a spatial-temporal masking strategy during the training stage to mutually enhance each other and improve performance. UniPaint produces high-quality and aesthetically pleasing results, achieving the best quantitative results across various tasks and scale setups. The code and checkpoints are available at $\href{https://github.com/mmmmm-w/UniPaint}{this \ repository}$.
title UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2412.06340