Salvato in:
| Autori principali: | , , , , |
|---|---|
| Natura: | Preprint |
| Pubblicazione: |
2024
|
| Soggetti: | |
| Accesso online: | https://arxiv.org/abs/2412.06340 |
| Tags: |
Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
|
| _version_ | 1866911080506195968 |
|---|---|
| author | Wan, Zhen Qi, Chenyang Liu, Zhiheng Gui, Tao Ma, Yue |
| author_facet | Wan, Zhen Qi, Chenyang Liu, Zhiheng Gui, Tao Ma, Yue |
| contents | In this paper, we present UniPaint, a unified generative space-time video inpainting framework that enables spatial-temporal inpainting and interpolation. Different from existing methods that treat video inpainting and video interpolation as two distinct tasks, we leverage a unified inpainting framework to tackle them and observe that these two tasks can mutually enhance synthesis performance. Specifically, we first introduce a plug-and-play space-time video inpainting adapter, which can be employed in various personalized models. The key insight is to propose a Mixture of Experts (MoE) attention to cover various tasks. Then, we design a spatial-temporal masking strategy during the training stage to mutually enhance each other and improve performance. UniPaint produces high-quality and aesthetically pleasing results, achieving the best quantitative results across various tasks and scale setups. The code and checkpoints are available at $\href{https://github.com/mmmmm-w/UniPaint}{this \ repository}$. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2412_06340 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts Wan, Zhen Qi, Chenyang Liu, Zhiheng Gui, Tao Ma, Yue Computer Vision and Pattern Recognition In this paper, we present UniPaint, a unified generative space-time video inpainting framework that enables spatial-temporal inpainting and interpolation. Different from existing methods that treat video inpainting and video interpolation as two distinct tasks, we leverage a unified inpainting framework to tackle them and observe that these two tasks can mutually enhance synthesis performance. Specifically, we first introduce a plug-and-play space-time video inpainting adapter, which can be employed in various personalized models. The key insight is to propose a Mixture of Experts (MoE) attention to cover various tasks. Then, we design a spatial-temporal masking strategy during the training stage to mutually enhance each other and improve performance. UniPaint produces high-quality and aesthetically pleasing results, achieving the best quantitative results across various tasks and scale setups. The code and checkpoints are available at $\href{https://github.com/mmmmm-w/UniPaint}{this \ repository}$. |
| title | UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2412.06340 |