Salvato in:
Dettagli Bibliografici
Autori principali: Qi, Xia, Cong, Peishan, Yao, Yichen, Wang, Ziyi, Ye, Yaoqin, Ma, Yuexin
Natura: Preprint
Pubblicazione: 2026
Soggetti:
Accesso online:https://arxiv.org/abs/2604.14556
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866911597448921088
author Qi, Xia
Cong, Peishan
Yao, Yichen
Wang, Ziyi
Ye, Yaoqin
Ma, Yuexin
author_facet Qi, Xia
Cong, Peishan
Yao, Yichen
Wang, Ziyi
Ye, Yaoqin
Ma, Yuexin
contents Video object insertion is a critical task for dynamically inserting new objects into existing environments. Previous video generation methods focus primarily on synthesizing entire scenes while struggling with ensuring consistent object appearance, spatial alignment, and temporal coherence when inserting objects into existing videos. In this paper, we propose a novel solution for Video Object Insertion, which integrates multi-view object priors to address the common challenges of appearance inconsistency and occlusion handling in dynamic environments. By lifting 2D reference images into multi-view representations and leveraging a dual-path view-consistent conditioning mechanism, our framework ensures stable identity guidance and robust integration across diverse viewpoints. A quality-aware weighting mechanism is also employed to adaptively handle noisy or imperfect inputs. Additionally, we introduce an Integration-Aware Consistency Module that guarantees spatial realism, effectively resolving occlusion and boundary artifacts while maintaining temporal continuity across frames. Experimental results show that our solution significantly improves the quality of video object insertion, providing stable and realistic integration.
format Preprint
id arxiv_https___arxiv_org_abs_2604_14556
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Controllable Video Object Insertion via Multiview Priors
Qi, Xia
Cong, Peishan
Yao, Yichen
Wang, Ziyi
Ye, Yaoqin
Ma, Yuexin
Computer Vision and Pattern Recognition
Artificial Intelligence
Video object insertion is a critical task for dynamically inserting new objects into existing environments. Previous video generation methods focus primarily on synthesizing entire scenes while struggling with ensuring consistent object appearance, spatial alignment, and temporal coherence when inserting objects into existing videos. In this paper, we propose a novel solution for Video Object Insertion, which integrates multi-view object priors to address the common challenges of appearance inconsistency and occlusion handling in dynamic environments. By lifting 2D reference images into multi-view representations and leveraging a dual-path view-consistent conditioning mechanism, our framework ensures stable identity guidance and robust integration across diverse viewpoints. A quality-aware weighting mechanism is also employed to adaptively handle noisy or imperfect inputs. Additionally, we introduce an Integration-Aware Consistency Module that guarantees spatial realism, effectively resolving occlusion and boundary artifacts while maintaining temporal continuity across frames. Experimental results show that our solution significantly improves the quality of video object insertion, providing stable and realistic integration.
title Controllable Video Object Insertion via Multiview Priors
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2604.14556