Enregistré dans:
Détails bibliographiques
Auteurs principaux: Zhang, Ying, Li, Yuezun, Peng, Bo, Zhou, Jiaran, Zhou, Huiyu, Dong, Junyu
Format: Preprint
Publié: 2024
Sujets:
Accès en ligne:https://arxiv.org/abs/2404.11054
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866914928647995392
author Zhang, Ying
Li, Yuezun
Peng, Bo
Zhou, Jiaran
Zhou, Huiyu
Dong, Junyu
author_facet Zhang, Ying
Li, Yuezun
Peng, Bo
Zhou, Jiaran
Zhou, Huiyu
Dong, Junyu
contents The task of video inpainting detection is to expose the pixel-level inpainted regions within a video sequence. Existing methods usually focus on leveraging spatial and temporal inconsistencies. However, these methods typically employ fixed operations to combine spatial and temporal clues, limiting their applicability in different scenarios. In this paper, we introduce a novel Multilateral Temporal-view Pyramid Transformer ({\em MumPy}) that collaborates spatial-temporal clues flexibly. Our method utilizes a newly designed multilateral temporal-view encoder to extract various collaborations of spatial-temporal clues and introduces a deformable window-based temporal-view interaction module to enhance the diversity of these collaborations. Subsequently, we develop a multi-pyramid decoder to aggregate the various types of features and generate detection maps. By adjusting the contribution strength of spatial and temporal clues, our method can effectively identify inpainted regions. We validate our method on existing datasets and also introduce a new challenging and large-scale Video Inpainting dataset based on the YouTube-VOS dataset, which employs several more recent inpainting methods. The results demonstrate the superiority of our method in both in-domain and cross-domain evaluation scenarios.
format Preprint
id arxiv_https___arxiv_org_abs_2404_11054
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection
Zhang, Ying
Li, Yuezun
Peng, Bo
Zhou, Jiaran
Zhou, Huiyu
Dong, Junyu
Computer Vision and Pattern Recognition
The task of video inpainting detection is to expose the pixel-level inpainted regions within a video sequence. Existing methods usually focus on leveraging spatial and temporal inconsistencies. However, these methods typically employ fixed operations to combine spatial and temporal clues, limiting their applicability in different scenarios. In this paper, we introduce a novel Multilateral Temporal-view Pyramid Transformer ({\em MumPy}) that collaborates spatial-temporal clues flexibly. Our method utilizes a newly designed multilateral temporal-view encoder to extract various collaborations of spatial-temporal clues and introduces a deformable window-based temporal-view interaction module to enhance the diversity of these collaborations. Subsequently, we develop a multi-pyramid decoder to aggregate the various types of features and generate detection maps. By adjusting the contribution strength of spatial and temporal clues, our method can effectively identify inpainted regions. We validate our method on existing datasets and also introduce a new challenging and large-scale Video Inpainting dataset based on the YouTube-VOS dataset, which employs several more recent inpainting methods. The results demonstrate the superiority of our method in both in-domain and cross-domain evaluation scenarios.
title Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2404.11054