MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Qi, Xia, Cong, Peishan, Yao, Yichen, Wang, Ziyi, Ye, Yaoqin, Ma, Yuexin
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Computer Vision and Pattern Recognition Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2604.14556
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866911597448921088
author	Qi, Xia Cong, Peishan Yao, Yichen Wang, Ziyi Ye, Yaoqin Ma, Yuexin
author_facet	Qi, Xia Cong, Peishan Yao, Yichen Wang, Ziyi Ye, Yaoqin Ma, Yuexin
contents	Video object insertion is a critical task for dynamically inserting new objects into existing environments. Previous video generation methods focus primarily on synthesizing entire scenes while struggling with ensuring consistent object appearance, spatial alignment, and temporal coherence when inserting objects into existing videos. In this paper, we propose a novel solution for Video Object Insertion, which integrates multi-view object priors to address the common challenges of appearance inconsistency and occlusion handling in dynamic environments. By lifting 2D reference images into multi-view representations and leveraging a dual-path view-consistent conditioning mechanism, our framework ensures stable identity guidance and robust integration across diverse viewpoints. A quality-aware weighting mechanism is also employed to adaptively handle noisy or imperfect inputs. Additionally, we introduce an Integration-Aware Consistency Module that guarantees spatial realism, effectively resolving occlusion and boundary artifacts while maintaining temporal continuity across frames. Experimental results show that our solution significantly improves the quality of video object insertion, providing stable and realistic integration.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_14556
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Controllable Video Object Insertion via Multiview Priors Qi, Xia Cong, Peishan Yao, Yichen Wang, Ziyi Ye, Yaoqin Ma, Yuexin Computer Vision and Pattern Recognition Artificial Intelligence Video object insertion is a critical task for dynamically inserting new objects into existing environments. Previous video generation methods focus primarily on synthesizing entire scenes while struggling with ensuring consistent object appearance, spatial alignment, and temporal coherence when inserting objects into existing videos. In this paper, we propose a novel solution for Video Object Insertion, which integrates multi-view object priors to address the common challenges of appearance inconsistency and occlusion handling in dynamic environments. By lifting 2D reference images into multi-view representations and leveraging a dual-path view-consistent conditioning mechanism, our framework ensures stable identity guidance and robust integration across diverse viewpoints. A quality-aware weighting mechanism is also employed to adaptively handle noisy or imperfect inputs. Additionally, we introduce an Integration-Aware Consistency Module that guarantees spatial realism, effectively resolving occlusion and boundary artifacts while maintaining temporal continuity across frames. Experimental results show that our solution significantly improves the quality of video object insertion, providing stable and realistic integration.
title	Controllable Video Object Insertion via Multiview Priors
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2604.14556

Documenti analoghi