Saved in:
Bibliographic Details
Main Authors: Lu, Hongbo, Yao, Liang, He, Chenghao, Liu, Fan, Liao, Wenlong, He, Tao, Peng, Pai
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.17382
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915871551651840
author Lu, Hongbo
Yao, Liang
He, Chenghao
Liu, Fan
Liao, Wenlong
He, Tao
Peng, Pai
author_facet Lu, Hongbo
Yao, Liang
He, Chenghao
Liu, Fan
Liao, Wenlong
He, Tao
Peng, Pai
contents A fundamental bottleneck in Novel View Synthesis (NVS) for autonomous driving is the inherent supervision gap on novel trajectories: models are tasked with synthesizing unseen views during inference, yet lack ground truth images for these shifted poses during training. In this paper, we propose VisionNVS, a camera-only framework that fundamentally reformulates view synthesis from an ill-posed extrapolation problem into a self-supervised inpainting task. By introducing a ``Virtual-Shift'' strategy, we use monocular depth proxies to simulate occlusion patterns and map them onto the original view. This paradigm shift allows the use of raw, recorded images as pixel-perfect supervision, effectively eliminating the domain gap inherent in previous approaches. Furthermore, we address spatial consistency through a Pseudo-3D Seam Synthesis strategy, which integrates visual data from adjacent cameras during training to explicitly model real-world photometric discrepancies and calibration errors. Experiments demonstrate that VisionNVS achieves superior geometric fidelity and visual quality compared to LiDAR-dependent baselines, offering a robust solution for scalable driving simulation.
format Preprint
id arxiv_https___arxiv_org_abs_2603_17382
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle VisionNVS: Self-Supervised Inpainting for Novel View Synthesis under the Virtual-Shift Paradigm
Lu, Hongbo
Yao, Liang
He, Chenghao
Liu, Fan
Liao, Wenlong
He, Tao
Peng, Pai
Computer Vision and Pattern Recognition
A fundamental bottleneck in Novel View Synthesis (NVS) for autonomous driving is the inherent supervision gap on novel trajectories: models are tasked with synthesizing unseen views during inference, yet lack ground truth images for these shifted poses during training. In this paper, we propose VisionNVS, a camera-only framework that fundamentally reformulates view synthesis from an ill-posed extrapolation problem into a self-supervised inpainting task. By introducing a ``Virtual-Shift'' strategy, we use monocular depth proxies to simulate occlusion patterns and map them onto the original view. This paradigm shift allows the use of raw, recorded images as pixel-perfect supervision, effectively eliminating the domain gap inherent in previous approaches. Furthermore, we address spatial consistency through a Pseudo-3D Seam Synthesis strategy, which integrates visual data from adjacent cameras during training to explicitly model real-world photometric discrepancies and calibration errors. Experiments demonstrate that VisionNVS achieves superior geometric fidelity and visual quality compared to LiDAR-dependent baselines, offering a robust solution for scalable driving simulation.
title VisionNVS: Self-Supervised Inpainting for Novel View Synthesis under the Virtual-Shift Paradigm
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2603.17382