Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Valassakis, Eugene, Garcia-Hernando, Guillermo
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning Robotics
Online Access:	https://arxiv.org/abs/2407.15844
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911964216688640
author	Valassakis, Eugene Garcia-Hernando, Guillermo
author_facet	Valassakis, Eugene Garcia-Hernando, Guillermo
contents	Predicting camera-space hand meshes from single RGB images is crucial for enabling realistic hand interactions in 3D virtual and augmented worlds. Previous work typically divided the task into two stages: given a cropped image of the hand, predict meshes in relative coordinates, followed by lifting these predictions into camera space in a separate and independent stage, often resulting in the loss of valuable contextual and scale information. To prevent the loss of these cues, we propose unifying these two stages into an end-to-end solution that addresses the 2D-3D correspondence problem. This solution enables back-propagation from camera space outputs to the rest of the network through a new differentiable global positioning module. We also introduce an image rectification step that harmonizes both the training dataset and the input image as if they were acquired with the same camera, helping to alleviate the inherent scale-depth ambiguity of the problem. We validate the effectiveness of our framework in evaluations against several baselines and state-of-the-art approaches across three public benchmarks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_15844
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning Valassakis, Eugene Garcia-Hernando, Guillermo Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning Robotics Predicting camera-space hand meshes from single RGB images is crucial for enabling realistic hand interactions in 3D virtual and augmented worlds. Previous work typically divided the task into two stages: given a cropped image of the hand, predict meshes in relative coordinates, followed by lifting these predictions into camera space in a separate and independent stage, often resulting in the loss of valuable contextual and scale information. To prevent the loss of these cues, we propose unifying these two stages into an end-to-end solution that addresses the 2D-3D correspondence problem. This solution enables back-propagation from camera space outputs to the rest of the network through a new differentiable global positioning module. We also introduce an image rectification step that harmonizes both the training dataset and the input image as if they were acquired with the same camera, helping to alleviate the inherent scale-depth ambiguity of the problem. We validate the effectiveness of our framework in evaluations against several baselines and state-of-the-art approaches across three public benchmarks.
title	HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning
topic	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning Robotics
url	https://arxiv.org/abs/2407.15844

Similar Items