Guardado en:
Detalles Bibliográficos
Autores principales: Rahary, Adrien Ramanana, Dufour, Nicolas, Perez, Patrick, Picard, David
Formato: Preprint
Publicado: 2026
Materias:
Acceso en línea:https://arxiv.org/abs/2603.23488
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866917408110804992
author Rahary, Adrien Ramanana
Dufour, Nicolas
Perez, Patrick
Picard, David
author_facet Rahary, Adrien Ramanana
Dufour, Nicolas
Perez, Patrick
Picard, David
contents Monocular novel-view synthesis has long required multi-view image pairs for supervision, limiting training data scale and diversity. We argue it is not necessary: one view is enough. We present OVIE, trained entirely on unpaired internet images. We leverage a monocular depth estimator as a geometric scaffold at training time: we lift a source image into 3D, apply a sampled camera transformation, and project to obtain a pseudo-target view. To handle disocclusions, we introduce a masked training formulation that restricts geometric, perceptual, and textural losses to valid regions, enabling training on 30 million uncurated images. At inference, OVIE is geometry-free, requiring no depth estimator or 3D representation. Trained exclusively on in-the-wild images, OVIE outperforms prior methods in a zero-shot setting, while being 600x faster than the second-best baseline. Code and models are publicly available at https://github.com/AdrienRR/ovie.
format Preprint
id arxiv_https___arxiv_org_abs_2603_23488
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle One View Is Enough! Monocular Training for In-the-Wild Novel View Generation
Rahary, Adrien Ramanana
Dufour, Nicolas
Perez, Patrick
Picard, David
Computer Vision and Pattern Recognition
Monocular novel-view synthesis has long required multi-view image pairs for supervision, limiting training data scale and diversity. We argue it is not necessary: one view is enough. We present OVIE, trained entirely on unpaired internet images. We leverage a monocular depth estimator as a geometric scaffold at training time: we lift a source image into 3D, apply a sampled camera transformation, and project to obtain a pseudo-target view. To handle disocclusions, we introduce a masked training formulation that restricts geometric, perceptual, and textural losses to valid regions, enabling training on 30 million uncurated images. At inference, OVIE is geometry-free, requiring no depth estimator or 3D representation. Trained exclusively on in-the-wild images, OVIE outperforms prior methods in a zero-shot setting, while being 600x faster than the second-best baseline. Code and models are publicly available at https://github.com/AdrienRR/ovie.
title One View Is Enough! Monocular Training for In-the-Wild Novel View Generation
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2603.23488