Enregistré dans:
| Auteurs principaux: | , , , , |
|---|---|
| Format: | Preprint |
| Publié: |
2026
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2602.05822 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866908816110518272 |
|---|---|
| author | Ali, Musawar Carranza-García, Manuel Fioraio, Nicola Salti, Samuele Di Stefano, Luigi |
| author_facet | Ali, Musawar Carranza-García, Manuel Fioraio, Nicola Salti, Samuele Di Stefano, Luigi |
| contents | We propose NVS-HO, the first benchmark designed for novel view synthesis of handheld objects in real-world environments using only RGB inputs. Each object is recorded in two complementary RGB sequences: (1) a handheld sequence, where the object is manipulated in front of a static camera, and (2) a board sequence, where the object is fixed on a ChArUco board to provide accurate camera poses via marker detection. The goal of NVS-HO is to learn a NVS model that captures the full appearance of an object from (1), whereas (2) provides the ground-truth images used for evaluation. To establish baselines, we consider both a classical SfM pipeline and a state-of-the-art pre-trained feed-forward neural network (VGGT) as pose estimators, and train NVS models based on NeRF and Gaussian Splatting. Our experiments reveal significant performance gaps in current methods under unconstrained handheld conditions, highlighting the need for more robust approaches. NVS-HO thus offers a challenging real-world benchmark to drive progress in RGB-based novel view synthesis of handheld objects. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2602_05822 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | NVS-HO: A Benchmark for Novel View Synthesis of Handheld Objects Ali, Musawar Carranza-García, Manuel Fioraio, Nicola Salti, Samuele Di Stefano, Luigi Computer Vision and Pattern Recognition We propose NVS-HO, the first benchmark designed for novel view synthesis of handheld objects in real-world environments using only RGB inputs. Each object is recorded in two complementary RGB sequences: (1) a handheld sequence, where the object is manipulated in front of a static camera, and (2) a board sequence, where the object is fixed on a ChArUco board to provide accurate camera poses via marker detection. The goal of NVS-HO is to learn a NVS model that captures the full appearance of an object from (1), whereas (2) provides the ground-truth images used for evaluation. To establish baselines, we consider both a classical SfM pipeline and a state-of-the-art pre-trained feed-forward neural network (VGGT) as pose estimators, and train NVS models based on NeRF and Gaussian Splatting. Our experiments reveal significant performance gaps in current methods under unconstrained handheld conditions, highlighting the need for more robust approaches. NVS-HO thus offers a challenging real-world benchmark to drive progress in RGB-based novel view synthesis of handheld objects. |
| title | NVS-HO: A Benchmark for Novel View Synthesis of Handheld Objects |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2602.05822 |