Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Rahary, Adrien Ramanana, Dufour, Nicolas, Perez, Patrick, Picard, David
Formato:	Preprint
Publicado:	2026
Materias:	Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2603.23488
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866917408110804992
author	Rahary, Adrien Ramanana Dufour, Nicolas Perez, Patrick Picard, David
author_facet	Rahary, Adrien Ramanana Dufour, Nicolas Perez, Patrick Picard, David
contents	Monocular novel-view synthesis has long required multi-view image pairs for supervision, limiting training data scale and diversity. We argue it is not necessary: one view is enough. We present OVIE, trained entirely on unpaired internet images. We leverage a monocular depth estimator as a geometric scaffold at training time: we lift a source image into 3D, apply a sampled camera transformation, and project to obtain a pseudo-target view. To handle disocclusions, we introduce a masked training formulation that restricts geometric, perceptual, and textural losses to valid regions, enabling training on 30 million uncurated images. At inference, OVIE is geometry-free, requiring no depth estimator or 3D representation. Trained exclusively on in-the-wild images, OVIE outperforms prior methods in a zero-shot setting, while being 600x faster than the second-best baseline. Code and models are publicly available at https://github.com/AdrienRR/ovie.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_23488
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	One View Is Enough! Monocular Training for In-the-Wild Novel View Generation Rahary, Adrien Ramanana Dufour, Nicolas Perez, Patrick Picard, David Computer Vision and Pattern Recognition Monocular novel-view synthesis has long required multi-view image pairs for supervision, limiting training data scale and diversity. We argue it is not necessary: one view is enough. We present OVIE, trained entirely on unpaired internet images. We leverage a monocular depth estimator as a geometric scaffold at training time: we lift a source image into 3D, apply a sampled camera transformation, and project to obtain a pseudo-target view. To handle disocclusions, we introduce a masked training formulation that restricts geometric, perceptual, and textural losses to valid regions, enabling training on 30 million uncurated images. At inference, OVIE is geometry-free, requiring no depth estimator or 3D representation. Trained exclusively on in-the-wild images, OVIE outperforms prior methods in a zero-shot setting, while being 600x faster than the second-best baseline. Code and models are publicly available at https://github.com/AdrienRR/ovie.
title	One View Is Enough! Monocular Training for In-the-Wild Novel View Generation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2603.23488

Ejemplares similares