Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.17382 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866915871551651840 |
|---|---|
| author | Lu, Hongbo Yao, Liang He, Chenghao Liu, Fan Liao, Wenlong He, Tao Peng, Pai |
| author_facet | Lu, Hongbo Yao, Liang He, Chenghao Liu, Fan Liao, Wenlong He, Tao Peng, Pai |
| contents | A fundamental bottleneck in Novel View Synthesis (NVS) for autonomous driving is the inherent supervision gap on novel trajectories: models are tasked with synthesizing unseen views during inference, yet lack ground truth images for these shifted poses during training. In this paper, we propose VisionNVS, a camera-only framework that fundamentally reformulates view synthesis from an ill-posed extrapolation problem into a self-supervised inpainting task. By introducing a ``Virtual-Shift'' strategy, we use monocular depth proxies to simulate occlusion patterns and map them onto the original view. This paradigm shift allows the use of raw, recorded images as pixel-perfect supervision, effectively eliminating the domain gap inherent in previous approaches. Furthermore, we address spatial consistency through a Pseudo-3D Seam Synthesis strategy, which integrates visual data from adjacent cameras during training to explicitly model real-world photometric discrepancies and calibration errors. Experiments demonstrate that VisionNVS achieves superior geometric fidelity and visual quality compared to LiDAR-dependent baselines, offering a robust solution for scalable driving simulation. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2603_17382 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | VisionNVS: Self-Supervised Inpainting for Novel View Synthesis under the Virtual-Shift Paradigm Lu, Hongbo Yao, Liang He, Chenghao Liu, Fan Liao, Wenlong He, Tao Peng, Pai Computer Vision and Pattern Recognition A fundamental bottleneck in Novel View Synthesis (NVS) for autonomous driving is the inherent supervision gap on novel trajectories: models are tasked with synthesizing unseen views during inference, yet lack ground truth images for these shifted poses during training. In this paper, we propose VisionNVS, a camera-only framework that fundamentally reformulates view synthesis from an ill-posed extrapolation problem into a self-supervised inpainting task. By introducing a ``Virtual-Shift'' strategy, we use monocular depth proxies to simulate occlusion patterns and map them onto the original view. This paradigm shift allows the use of raw, recorded images as pixel-perfect supervision, effectively eliminating the domain gap inherent in previous approaches. Furthermore, we address spatial consistency through a Pseudo-3D Seam Synthesis strategy, which integrates visual data from adjacent cameras during training to explicitly model real-world photometric discrepancies and calibration errors. Experiments demonstrate that VisionNVS achieves superior geometric fidelity and visual quality compared to LiDAR-dependent baselines, offering a robust solution for scalable driving simulation. |
| title | VisionNVS: Self-Supervised Inpainting for Novel View Synthesis under the Virtual-Shift Paradigm |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2603.17382 |