Guardado en:
| Autores principales: | , , , |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2603.23488 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| _version_ | 1866917408110804992 |
|---|---|
| author | Rahary, Adrien Ramanana Dufour, Nicolas Perez, Patrick Picard, David |
| author_facet | Rahary, Adrien Ramanana Dufour, Nicolas Perez, Patrick Picard, David |
| contents | Monocular novel-view synthesis has long required multi-view image pairs for supervision, limiting training data scale and diversity. We argue it is not necessary: one view is enough. We present OVIE, trained entirely on unpaired internet images. We leverage a monocular depth estimator as a geometric scaffold at training time: we lift a source image into 3D, apply a sampled camera transformation, and project to obtain a pseudo-target view. To handle disocclusions, we introduce a masked training formulation that restricts geometric, perceptual, and textural losses to valid regions, enabling training on 30 million uncurated images. At inference, OVIE is geometry-free, requiring no depth estimator or 3D representation. Trained exclusively on in-the-wild images, OVIE outperforms prior methods in a zero-shot setting, while being 600x faster than the second-best baseline. Code and models are publicly available at https://github.com/AdrienRR/ovie. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2603_23488 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | One View Is Enough! Monocular Training for In-the-Wild Novel View Generation Rahary, Adrien Ramanana Dufour, Nicolas Perez, Patrick Picard, David Computer Vision and Pattern Recognition Monocular novel-view synthesis has long required multi-view image pairs for supervision, limiting training data scale and diversity. We argue it is not necessary: one view is enough. We present OVIE, trained entirely on unpaired internet images. We leverage a monocular depth estimator as a geometric scaffold at training time: we lift a source image into 3D, apply a sampled camera transformation, and project to obtain a pseudo-target view. To handle disocclusions, we introduce a masked training formulation that restricts geometric, perceptual, and textural losses to valid regions, enabling training on 30 million uncurated images. At inference, OVIE is geometry-free, requiring no depth estimator or 3D representation. Trained exclusively on in-the-wild images, OVIE outperforms prior methods in a zero-shot setting, while being 600x faster than the second-best baseline. Code and models are publicly available at https://github.com/AdrienRR/ovie. |
| title | One View Is Enough! Monocular Training for In-the-Wild Novel View Generation |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2603.23488 |