Enregistré dans:
Détails bibliographiques
Auteurs principaux: Ali, Musawar, Carranza-García, Manuel, Fioraio, Nicola, Salti, Samuele, Di Stefano, Luigi
Format: Preprint
Publié: 2026
Sujets:
Accès en ligne:https://arxiv.org/abs/2602.05822
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866908816110518272
author Ali, Musawar
Carranza-García, Manuel
Fioraio, Nicola
Salti, Samuele
Di Stefano, Luigi
author_facet Ali, Musawar
Carranza-García, Manuel
Fioraio, Nicola
Salti, Samuele
Di Stefano, Luigi
contents We propose NVS-HO, the first benchmark designed for novel view synthesis of handheld objects in real-world environments using only RGB inputs. Each object is recorded in two complementary RGB sequences: (1) a handheld sequence, where the object is manipulated in front of a static camera, and (2) a board sequence, where the object is fixed on a ChArUco board to provide accurate camera poses via marker detection. The goal of NVS-HO is to learn a NVS model that captures the full appearance of an object from (1), whereas (2) provides the ground-truth images used for evaluation. To establish baselines, we consider both a classical SfM pipeline and a state-of-the-art pre-trained feed-forward neural network (VGGT) as pose estimators, and train NVS models based on NeRF and Gaussian Splatting. Our experiments reveal significant performance gaps in current methods under unconstrained handheld conditions, highlighting the need for more robust approaches. NVS-HO thus offers a challenging real-world benchmark to drive progress in RGB-based novel view synthesis of handheld objects.
format Preprint
id arxiv_https___arxiv_org_abs_2602_05822
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle NVS-HO: A Benchmark for Novel View Synthesis of Handheld Objects
Ali, Musawar
Carranza-García, Manuel
Fioraio, Nicola
Salti, Samuele
Di Stefano, Luigi
Computer Vision and Pattern Recognition
We propose NVS-HO, the first benchmark designed for novel view synthesis of handheld objects in real-world environments using only RGB inputs. Each object is recorded in two complementary RGB sequences: (1) a handheld sequence, where the object is manipulated in front of a static camera, and (2) a board sequence, where the object is fixed on a ChArUco board to provide accurate camera poses via marker detection. The goal of NVS-HO is to learn a NVS model that captures the full appearance of an object from (1), whereas (2) provides the ground-truth images used for evaluation. To establish baselines, we consider both a classical SfM pipeline and a state-of-the-art pre-trained feed-forward neural network (VGGT) as pose estimators, and train NVS models based on NeRF and Gaussian Splatting. Our experiments reveal significant performance gaps in current methods under unconstrained handheld conditions, highlighting the need for more robust approaches. NVS-HO thus offers a challenging real-world benchmark to drive progress in RGB-based novel view synthesis of handheld objects.
title NVS-HO: A Benchmark for Novel View Synthesis of Handheld Objects
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2602.05822